Chen Guo wrote: > Hi all, > This is mostly a step towards multithreaded sort the unix way, but as > Padraig mentioned, has its other uses.
Thanks again for looking at this. > Parsing and I/O are not my strong suits, so I have a couple of questions: > > Are there more appropriate functions than open and pread to use here? I > usually see wrapper functions called in place of actual functions like fopen, > fread, etc, and it feels rather inappropriate for me to use open and pread > here. > > And are there any suggestions for parsing the --chunk option in a better > way? I feel having two separate options specifying both required values is > redundant, so I decided to separate the values by a comma, as Jim had in an > example he linked me. The way I wrote it, it feels like a hacked workaround, > but I'm not sure how else to get around that comma. That's pretty much what I was thinking from the first mail I quoted: The `read_chunk` process above is currently awkward and inefficient to implement with dd and split. As a first step I think it would be very useful to add a --number option to `split`, like: --number=5 #split input into 5 chunks --number=2/5 #output chunk 2 of 5 to [to stdout] In text mode, this should handle only splitting on a line boundary, and adding any extra data into the last chunk. I do think --number is more general than --chunk as it allows you to specify only 1 number to get the behaviour described above. Also I notice that FreeBSDs split recently got a '-n chunk_count' option, so it would be good to maintain compat with that if possible. We also need to decide how to select between text and binary modes for --number. Note reading from non seekable input complicates things. For binary data I don't see how one could support --number. > > Also, any opinions on how the lines should be output? As of now I just > have it as stdout, since that's how I see sort would use it. And of course, > anything else I missed/could've done better? Thanks a lot guys. It makes sense to just send the single "chunk" to stdout. cheers, Pádraig.