[ 
https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129205#comment-14129205
 ] 

Gopal V commented on HIVE-8038:
-------------------------------

bq. Example : stripe size 512MB and BLOCK SIZE is 400MB, in that case, split 
would span more than one block.

I think that would be a case for reducing the stripe size, if performance is 
the primary goal. We already default to a 64Mb stripe in hive-14 down from 
256Mb because of similar issues.

That said, a TreeMap is probably the right way to do this when there is 
variable length support (also we can turn off orc.padding).

bq. We want to generalize the usage for FileSystems apart from HDFS.

My current ~30Tb tests are all off HDFS. Can you tell me what filesystem impl 
that you are targetting these fixes on? 

> Decouple ORC files split calculation logic from Filesystem's get file 
> location implementation
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8038
>                 URL: https://issues.apache.org/jira/browse/HIVE-8038
>             Project: Hive
>          Issue Type: Improvement
>          Components: File Formats
>    Affects Versions: 0.13.1
>            Reporter: Pankit Thapar
>             Fix For: 0.14.0
>
>         Attachments: HIVE-8038.patch
>
>
> What is the Current Logic
> ======================
> 1.get the file blocks from FileSystem.getFileBlockLocations() which returns 
> an array of BlockLocation
> 2.In SplitGenerator.createSplit(), check if split only spans one block or 
> multiple blocks.
> 3.If split spans just one block, then using the array index (index = 
> offset/blockSize), get the corresponding host having the blockLocation
> 4.If the split spans multiple blocks, then get all hosts that have at least 
> 80% of the max of total data in split hosted by any host.
> 5.add the split to a list of splits
> Issue with Current Logic
> =====================
> Dependency on FileSystem API’s logic for block location calculations. It 
> returns an array and we need to rely on FileSystem to  
> make all blocks of same size if we want to directly access a block from the 
> array.
>  
> What is the Fix
> =============
> 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns 
> an array of BlockLocation
> 1b.convert the array into a tree map <offset, BlockLocation> and return it 
> through getLocationsWithOffSet()
> 2.In SplitGenerator.createSplit(), check if split only spans one block or 
> multiple blocks.
> 3.If split spans just one block, then using Tree.floorEntry(key), get the 
> highest entry smaller than offset for the split and get the corresponding 
> host.
> 4a.If the split spans multiple blocks, get a submap, which contains all 
> entries containing blockLocations from the offset to offset + length
> 4b.get all hosts that have at least 80% of the max of total data in split 
> hosted by any host.
> 5.add the split to a list of splits
> What are the major changes in logic
> ==============================
> 1. store BlockLocations in a Map instead of an array
> 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations()
> 3. one block case is checked by "if(offset + length <= start.getOffset() + 
> start.getLength())"  instead of "if((offset % blockSize) + length <= 
> blockSize)"
> What is the affect on Complexity (Big O)
> =================================
> 1. We add a O(n) loop to build a TreeMap from an array but its a one time 
> cost and would not be called for each split
> 2. In case of one block case, we can get the block in O(logn) worst case 
> which was O(1) before
> 3. Getting the submap is O(logn)
> 4. In case of multiple block case, building the list of hosts is O(m) which 
> was O(n) & m < n as previously we were iterating 
>    over all the block locations but now we are only iterating only blocks 
> that belong to that range go offsets that we need. 
> What are the benefits of the change
> ==============================
> 1. With this fix, we do not depend on the blockLocations returned by 
> FileSystem to figure out the block corresponding to the offset and blockSize
> 2. Also, it is not necessary that block lengths is same for all blocks for 
> all FileSystems
> 3. Previously we were using blockSize for one block case and block.length for 
> multiple block case, which is not the case now. We figure out the block
>    depending upon the actual length and offset of the block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to