I agree, there's no easy way around this one without separate interfaces (one where the caller keeps the counts, and one where the writable keeps the counts), and that would be silly.
However -> It still seems to me that the key length in the sequence file is redundant. Since each key must write its own length, know its own length, or be able to figure it out - even via the high speed interface - there's no reason to have that key length in the file. Why do I care about 4 bytes per record? Because we're integrating an external sort, and right now it has to look at a record with two key lengths. And I assume that others (such as Yahoo) will want to incorporate an external sort. And if we're going to be reading the sequence file in another language, we might as well be sure about the format to use. Thanks! Paul On 6/26/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
Eric Baldeschwieler wrote: > Can we turn this around and assume that writables will be given a stream > and a length when they read? That would also let us remove redundant > info... Unless I misunderstand, that would make it harder to nest writables, since all containers would need to store the length. Currently only top-level containers (SequenceFile and the RPC protocol) need to write lengths. Even these are optional, used only to optimize things. Doug