yeah, i would like to see a more "term-vector"/sax like api for extracting values that requires no extra object overhead as well
pass in a "collector" that will call methods as fields are encountered (and can return false if walking the document should stop (or some Enum for more options)) i just throw away the lucene Document and Field objects when i'm done with them anyway (well i'll cache them in an LRU cache for later reuse, but i could do smarter things if i didn't need the lucene Document object in the first place) -- Tim Earwin Burrfoot wrote: > Missed that, I have a heap of unread Jira mails :/ > > Okay, you're reusing Document object and the list inside. To reuse > Fieldable instances you'd have to do some very awkward things. > More awkward things are required to extract your longed-for values > from the Document. > To add insult to injury, Document and Fieldable define a boatload of > stuff that is used at indexation-time, but has zero meaning at > search-time. > This is just broken, quickly-hacked-together API. > > 2010/2/25 Tim Smith <[email protected]>: > >> I created LUCENE-2276 a couple of days ago to at least allow reusing >> Document objects (didn't see any interest from anyone though) >> >> -- Tim >> >> Erick Erickson wrote: >> >> OK, never mind <G>.... >> Erick >> >> On Thu, Feb 25, 2010 at 1:48 PM, Earwin Burrfoot <[email protected]> wrote: >> >>> My issue is with extra objects created in the process. Field selection >>> can be handled with, well, FieldSelector. >>> >>> 2010/2/25 Erick Erickson <[email protected]>: >>> >>>> Does LazyLoading address this? I'm assuming your issue is >>>> that the default behavior loads the entire document regardless >>>> of whether you actually want all the fields..... >>>> Erick >>>> >>>> On Thu, Feb 25, 2010 at 7:52 AM, Earwin Burrfoot <[email protected]> >>>> wrote: >>>> >>>>> I'm thinking, should Lucene introduce new interface to read stored >>>>> document fields? >>>>> >>>>> Current 'Document document(int n)' mechanism is barely usable due to >>>>> overhead involved. While I believe underlying index structure works >>>>> pretty fast (if it fits in memory, as is the case for most >>>>> performance-concerned installations), there's no adequate access to it >>>>> and people are forced to introduce contraptions like LinkedIn's >>>>> payload-assisted luceneId<->appId mapping or similar caches we employ. >>>>> >>>>> What I am thinking about is something along the lines of existing >>>>> iterators like TermDocs/TermPositions. Iterate over docs, then iterate >>>>> over fields stored for each, extract data, ???, profit. >>>>> Comments? >>>>> >>>>> -- >>>>> Kirill Zakharenko/Кирилл Захаренко ([email protected]) >>>>> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 >>>>> ICQ: 104465785 >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>>> >>>>> >>>> >>> >>> -- >>> Kirill Zakharenko/Кирилл Захаренко ([email protected]) >>> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 >>> ICQ: 104465785 >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> >> > > > >
