"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> On Nov 19, 2007 6:52 PM, Michael Busch <[EMAIL PROTECTED]> wrote:
> > Yonik Seeley wrote:
> > >
> > > So I think we all agree to do payloads by reference (do not make a
> > > copy of byte[] like termBuffer does), and to allow payload reuse.
> > >
> > > So now we still have 3 viable options still on the table I think:
> > > Token{ byte[] payload, int payloadLength, ...}
> > > Token{ byte[] payload, int payloadOffset, int payloadLength,...}
> > > Token{ Payload p, ... }
> > >
> >
> > I'm for option 2. I agree that it is worthwhile to allow filters to
> > modify the payloads. And I'd like to optimize for the case where lot's
> > of tokens have payloads, and option 2 seems therefore the way to go.
> 
> Just to play devil's advocate, it seems like adding the byte[]
> directly to Token gains less than we might have been thinking if we
> have reuse in any case.  A TokenFilter could reuse the same Payload
> object for each term in a Field, so the CPU allocation savings is
> closer to a single Payload per field using payloads.
> 
> If we used a Payload object, it would save 8 bytes per Token for
> fields not using payloads.
> Besides an initial allocation per field, the additional cost to using
> a Payload field would be an additional dereference (but that should be
> really minor).

These are excellent points.  I guess I would lean [back] towards
keeping the separate Payload object and extending its API to allow
re-use and modification of its byte[]?

I'm now even wondering whether the char[] termBuffer should be by
reference (again!), too?  This would save 1 copy for those
TokenStreams that could provide a reference to their own char[]
buffers (eg CharTokenizer).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to