[ 
https://issues.apache.org/jira/browse/LUCENE-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804299#action_12804299
 ] 

Paul Elschot commented on LUCENE-2232:
--------------------------------------

I tried running with the patch applied on contrib/benchmark using the reuters 
data and sloppy phrase queries there. That did not really show any performance 
difference. Some further word counting revealed that the av. number of word in 
a reuters article is just below 129, which puts it in the category of small 
fields for which hardly any performance difference is expected for queries.
Building the index was somewhat slower, as expected.

Then I tried doing using sloppy phrase queries on the wiki data. But I ran into 
an "org.xml.sax.SAXParseException: Content is not allowed in prolog" while 
trying to build an index from enwiki.txt. I think I'm doing something 
completely wrong here, but I have no idea how to improve.
Could anyone give me some tips on how to continue with this?
Are there enough wiki articles of at least say 256 words in the 20070527 wiki 
dump?
Is there a way to generate interesting sloppy phrase queries for the wiki data?


> Use VShort to encode positions
> ------------------------------
>
>                 Key: LUCENE-2232
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2232
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Paul Elschot
>         Attachments: LUCENE-2232-nonbackwards.patch
>
>
> Improve decoding speed for typical case of two bytes for a delta position at 
> the cost of increasing the size of the proximity file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to