[ 
https://issues.apache.org/jira/browse/HADOOP-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174098#comment-17174098
 ] 

Anoop Sam John commented on HADOOP-17038:
-----------------------------------------

As mentioned in the desc, its main adv is with HBase where mostly the reads are 
random short reads. HBase by default do only positional reads for get/scans.  
We have a tracking mechanism in scan, where if consecutive blocks are reads by 
a scanner, we switch back to stream based reads(seek+ read model).   Also 
during scan while compaction we do stream reads means seek+ read..  In case of 
these long reads (specially compaction where only compaction thread working on 
that dedicated FileInputStream), reading at 4 MB per remote reads is very 
useful.  So its not that good to reduce fs.azure.read.request.size.  This 
reduction will help normal random row gets case but compactions will add more 
pressure on the FS.  Also if the same cluster is having range scans, that also 
might suffer. 
This is where the real pos reads make adv.  In this patch the pos read API is 
extended in AbfsInputStream and it will not rely on the buffer at all.  So the 
API is no longer synchronized. Also it will do read only the exact number of 
bytes being requested for.


> Support positional read in AbfsInputStream
> ------------------------------------------
>
>                 Key: HADOOP-17038
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17038
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>            Priority: Major
>              Labels: HBase, abfsactive
>
> Right now it will do a seek to the position , read and then seek back to the 
> old position.  (As per the impl in the super class)
> In HBase kind of workloads we rely mostly on short preads. (like 64 KB size 
> by default).  So would be ideal to support a pure pos read API which will not 
> even keep the data in a buffer but will only read the required data as what 
> is asked for by the caller. (Not reading ahead more data as per the read size 
> config)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to