[jira] [Commented] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.

Mark Miller (JIRA) Sun, 18 Aug 2013 10:14:28 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13743289#comment-13743289
 ]


Mark Miller commented on SOLR-5150:
-----------------------------------

bq. Based on the feedback I got the understood was that doing the seek followed 
by the readFully should have been highest performance.

That would go along with speed improvements we saw in indexing. As Uwe said, 
that does seem kind of weird or buggish, but the numbers seem to bear it out.

I think the issue is, in the indexing case, it's one thread dong the reading - 
and our perf tests do show it to be faster quite a bit faster. But the sync 
simply *kills* concurrent query reads. We see qps drop from like 38 qps to 3 
qps. Totally unacceptable performance.

This is why I've been looking at doing the synched seek + read or just the read 
based on the IO context.
                
> HdfsIndexInput may not fully read requested bytes.
> --------------------------------------------------
>
>                 Key: SOLR-5150
>                 URL: https://issues.apache.org/jira/browse/SOLR-5150
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.4
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>             Fix For: 4.5, 5.0
>
>         Attachments: SOLR-5150.patch
>
>
> Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - 
> the read call we are using may not read all of the requested bytes - it 
> returns the number of bytes actually written - which we ignore.
> Blur moved to using a seek and then readFully call - synchronizing across the 
> two calls to deal with clones.
> We have seen that really kills performance, and using the readFully call that 
> lets you pass the position rather than first doing a seek, performs much 
> better and does not require the synchronization.
> I also noticed that the seekInternal impl should not seek but be a no op 
> since we are seeking on the read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.

Reply via email to