length setters for Field with binary data

Eks Dev (JIRA) Tue, 05 Aug 2008 12:55:36 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620019#action_12620019
 ]


Eks Dev commented on LUCENE-1219:
---------------------------------

Great Mike,
it gets better and better, i saw LUCENE-1340 committed. Thanks to you Grant, 
Doug and all others that voted for 1349  this happened so quickly. Trust me, 
these two issues are really making my life easier. I pushed decision to add new 
hardware to some future point (means, save customer's money now)... a few weeks 
later would be too late.

Now it remains only to make one nice patch that enables us to pass our own 
byte[] for retrieving stored fields during search. I was thinking along the 
lines of  things you did in Analyzers.

we could pool the same trick for this, eg.

Field Document.getBinaryValue(String FIELD_NAME, Field destination);

Field already has all access methods (get/set), 

the contract would be: If destination==null, new one will be created and 
returned, if not we use this one and returne the same object back. The method 
should check if byte[] is big enough, if not simple growth policy can be there. 
 This way we avoid new byte[] each time you fetch stored field..

I did not look exactly at code now, but the last time I was looking into it it 
looked as quite simple to do something along these lines. Do you have some 
ideas how we could do it better?

Just simple calculation in my case, 
average Hits count is around 200, for each hit we have to fetch one stored 
field where we do some post-processing, re-scoring and whatnot. Currently we 
run max 30 rq/second , with average document length of 2k you lend at 2K * 200 
* 30 = 6000 object allocations per second totaling 12Mb ... only to get the 
data... I can imagine people with much longer documents  (that would be typical 
lucene use case)  where it gets worse... simply reducing gc() pressure with 
really small amount of work. I am sure this would have nice effects on some 
other use cases in lucene.

thanks again to all "workers"  behind this greet peace of software...
eks

PS:  I need to find some time to peek at paul's work in LUVENE -1345 and my 
wish list will be complete, at least for now (at least until you get your magic 
with flexi index format done :)  
 

> support array/offset/ length setters for Field with binary data
> ---------------------------------------------------------------
>
>                 Key: LUCENE-1219
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1219
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Eks Dev
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1219.patch, LUCENE-1219.patch, LUCENE-1219.patch, 
> LUCENE-1219.patch, LUCENE-1219.take2.patch, LUCENE-1219.take3.patch
>
>
> currently Field/Fieldable interface supports only compact, zero based byte 
> arrays. This forces end users to create and copy content of new objects 
> before passing them to Lucene as such fields are often of variable size. 
> Depending on use case, this can bring far from negligible  performance  
> improvement. 
> this approach extends Fieldable interface with 3 new methods   
> getOffset(); gettLenght(); and getBinaryValue() (this only returns reference 
> to the array)
>    

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1219) support array/offset/ length setters for Field with binary data

Reply via email to