[jira] [Commented] (LUCENE-6841) LZ4 compression using too much CPU time

Karl von Randow (JIRA) Fri, 16 Oct 2015 16:42:45 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961541#comment-14961541
 ]


Karl von Randow commented on LUCENE-6841:
-----------------------------------------

Thank you both very much for your prompt responses.

My search index on disk is 5.2G. My heap is larger than that and the box memory 
is larger than that, so possibly the entire index can fit in memory. Could you 
point me to something to try to verify this?

Number of docs is in the low millions. Each doc is often quite small, 10-20 
fields with only one containing more than a single line of text.

My profiling was only of threads that were in the RUNNABLE state, so I don't 
_believe_ that this includes IO waiting time. So I do suspect it is actually 
LZ4ing rather than IO.

I feel like to proceed I may need to do a run without compression to compare 
the CPU usage. This will I presume require rebuilding the search index… there 
is an upgrade process isn't there for converting between codecs? Perhaps I 
could use that to speed the process?

> LZ4 compression using too much CPU time
> ---------------------------------------
>
>                 Key: LUCENE-6841
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6841
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/codecs
>    Affects Versions: 5.3.1
>         Environment: Linux, Java 8
>            Reporter: Karl von Randow
>
> I am using Lucene for search indexing, including storing a large number of 
> small fields, and some larger plain text fields, and searching using both 
> exact matches and analyzed queries.
> LZ4 (specifically the decompress method) is using nearly exactly 50% of the 
> application's CPU time.
> It seems to me that LZ4 is inappropriate for my use case. I note that I can 
> choose BEST_SPEED or BEST_COMPRESSION.
> Would it be palatable to add a NO_COMPRESSION option, or some way to pick and 
> choose which fields get compressed? Perhaps a minimum length of a field could 
> be specified before it's compressed? I'm not sure if that's possible.
> If this approach, or similar is palatable, I would be happy to contribute a 
> patch (or to consume and test a patch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6841) LZ4 compression using too much CPU time

Reply via email to