[jira] [Commented] (LUCENE-4563) Fix DirTaxoWriter's Codec - don't rely on the default

Shai Erera (JIRA) Mon, 19 Nov 2012 06:01:03 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500237#comment-13500237
 ]


Shai Erera commented on LUCENE-4563:
------------------------------------

bq. I think you should just stick with the default. 

Maybe, but it'll be worth checking that.

bq. Of course tests run slowly with SimpleText!

Sure, only in the facets test, this usually doesn't serve any purpose.

Anyway, I don't rule out that this issue will be cancelled in the end, but at 
least we'll know if the default Codec is good enough, or a more optimized one 
can have better performance.
                
> Fix DirTaxoWriter's Codec - don't rely on the default
> -----------------------------------------------------
>
>                 Key: LUCENE-4563
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4563
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>
> Today, DirTaxoWriter opens an IndexWriter using the default Codec. While 
> running tests, I noticed that some of them take a veeeeery long time to 
> complete, some times. Debugging, I realized that they use SimpleText codec 
> b/c that's what the test-framework drew at random.
> That got me to think if we should really depend on the default Codec, or use 
> a special codec that is more suitable for the taxonomy index's unique 
> characteristics. Basically, the taxonomy index has two fields:
> # One in which the category path is saved, as StringField, and therefore each 
> term is associated with exactly one document
> # Another field with one term, such that a category's parent is written in 
> the position of that term for every document.
> Initially, I thought that we should really be using PulsingCodec. After a 
> brief chat about it w/ Robert, he said that Lucene41 Codec acts like pulsing 
> for fields like that. So I'm thinking that we should either:
> * Hard-code to Lucene41, if it's indeed useful.
> * Write a completely new Codec, that is special for that case. I.e. Lucene41 
> may handle these cases efficiently, but its code needs to be prepared for 
> other cases too, therefore we may be able to write something more efficient.
> I open that as a placeholder, I think that we should first come up w/ a 
> decent benchmark test in order to validate the results. The benchmark package 
> now contains some facet related stuff, so I'll take a look if that's enough.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4563) Fix DirTaxoWriter's Codec - don't rely on the default

Reply via email to