On Nov 19, 2007 6:52 PM, Michael Busch <[EMAIL PROTECTED]> wrote: > Yonik Seeley wrote: > > > > So I think we all agree to do payloads by reference (do not make a > > copy of byte[] like termBuffer does), and to allow payload reuse. > > > > So now we still have 3 viable options still on the table I think: > > Token{ byte[] payload, int payloadLength, ...} > > Token{ byte[] payload, int payloadOffset, int payloadLength,...} > > Token{ Payload p, ... } > > > > I'm for option 2. I agree that it is worthwhile to allow filters to > modify the payloads. And I'd like to optimize for the case where lot's > of tokens have payloads, and option 2 seems therefore the way to go.
Just to play devil's advocate, it seems like adding the byte[] directly to Token gains less than we might have been thinking if we have reuse in any case. A TokenFilter could reuse the same Payload object for each term in a Field, so the CPU allocation savings is closer to a single Payload per field using payloads. If we used a Payload object, it would save 8 bytes per Token for fields not using payloads. Besides an initial allocation per field, the additional cost to using a Payload field would be an additional dereference (but that should be really minor). So I'm a bit more on-the-fence... Thoughts? -Yonik --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]