[jira] [Commented] (LUCENE-7583) Can we improve OutputStreamIndexOutput's byte buffering when writing each BKD leaf block?

Michael McCandless (JIRA) Tue, 06 Dec 2016 07:35:35 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725830#comment-15725830
 ]


Michael McCandless commented on LUCENE-7583:
--------------------------------------------

bq. Are we sure that we do not open the IndexOutput in one thread and had it 
over to another one? 

Yeah, the {{IndexOutput}} is opened in {{Lucene60PointsWriter}}, and then that 
same thread goes and writes all points via {{writeField}}.  At IW flush time 
it's an indexing thread, and at merge time it's a merge thread, but it should 
only ever be a single thread touching that {{IndexOutput}}.  The benchmark I'm 
running only ever uses a single thread anyway ...

bq. we should also make all references to the IndexOutput private, so it cannot 
escape the current thread (to help hotspot). This means: no non-private fields 
holding the reference to the stream.

I'll try to do this; there's at least one place where it's protected, but 
that's way high up in the stack ({{Lucene60PointsWriter}}).

bq. If we are really required to fork the buffered stream, we may use: 
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/FastOutputStream.java
 (but without the DataOutput interface impl).

I'll test that too.

Thanks [~thetaphi].

> Can we improve OutputStreamIndexOutput's byte buffering when writing each BKD 
> leaf block?
> -----------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7583
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7583
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: master (7.0), 6.4
>
>         Attachments: LUCENE-7583-hardcode-writeVInt.patch, LUCENE-7583.patch
>
>
> When BKD writes its leaf blocks, it's essentially a lot of tiny writes (vint, 
> int, short, etc.), and I've seen deep thread stacks through our IndexOutput 
> impl ({{OutputStreamIndexOutput}}) when pulling hot threads while BKD is 
> writing.
> So I tried a small change, to have BKDWriter do its own buffering, by first 
> writing each leaf block into a {{RAMOutputStream}}, and then dumping that (in 
> 1 KB byte[] chunks) to the actual IndexOutput.
> This gives a non-trivial reduction (~6%) in the total time for BKD writing + 
> merging time on the 20M NYC taxis nightly benchmark (2 times each):
> Trunk, sparse:
>   - total: 64.691 sec
>   - total: 64.702 sec
> Patch, sparse:
>   - total: 60.820 sec
>   - total: 60.965 sec
> Trunk dense:
>   - total: 62.730 sec
>   - total: 62.383 sec
> Patch dense:
>   - total: 58.805 sec
>   - total: 58.742 sec
> The results seem to be consistent and reproducible.  I'm using Java 1.8.0_101 
> on a fast SSD on Ubuntu 16.04.
> It's sort of weird and annoying that this helps so much, because 
> {{OutputStreamIndexOutput}} already uses java's {{BufferedOutputStream}} 
> (default 8 KB buffer) to buffer writes.
> [~thetaphi] suggested maybe hotspot is failing to inline/optimize the 
> {{writeByte}} / the call stack just has too many layers.
> We could commit this patch (it's trivial) but it'd be nice to understand and 
> fix why buffering writes is somehow costly so any other Lucene codec 
> components that write lots of little things can be improved too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7583) Can we improve OutputStreamIndexOutput's byte buffering when writing each BKD leaf block?

Reply via email to