[ 
https://issues.apache.org/jira/browse/HBASE-26183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shriram updated HBASE-26183:
----------------------------
    Labels: performance  (was: )

> Size of the Result object while querying huge data from HBASE table
> -------------------------------------------------------------------
>
>                 Key: HBASE-26183
>                 URL: https://issues.apache.org/jira/browse/HBASE-26183
>             Project: HBase
>          Issue Type: New Feature
>          Components: scan
>    Affects Versions: 1.1.13
>            Reporter: shriram
>            Priority: Major
>              Labels: performance
>
>  
> I am trying to query hbase table with rowkeys. We have the following structure
>  * index table which has rowkeys of the actual table
>  * actual table which contains json data in compressed format.
> When i am trying to query hbase, i have to scan first index table for rowkeys 
> using scan with some filters which will results to byte array.(row keys). 
> Once we obtained rowkeys, we are invoking listofGets() in Table object. Once 
> obtained we are iterating the object and prepare a list which contains 
> compressed json objects. Here we are not sure about the size and number of 
> the objects. In case of number of objects is huge we may result in OOM. Do we 
> have any options to return Iterator or buffering the results so that we can 
> avoid OOM.
>  {{for (byte[] rowkey : indexTableOutput)
> {    Get get = new 
> Get(rowkey).addFamily(Bytes.toBytes(columnFamilty)).setMaxVersions(MAX_VERSIONS);
>     listOfget.add(get);
> }}}
> The above piece of code which is used to retrieve the keys from index table.
>  {{TableName tableName = TableName.valueOf("table1");Table tableObj = 
> conn.getTable(tableName);
> Result[] results = tableObj.get(listOfget);}}
> From the above piece of code we have few queries. Any help would be 
> appreciated.
>  * If we have a huge number of data, Result[] will contain all the results?
>  * How to return a iterator kind of object so that we can leave it to 
> consumer because keeping all the data and doing processing will result in OOM
>  * Any other options to return a limited data so that consumer do processing 
> and continue
> I could find a resultscanner is returning for scan objects. But couldn't find 
> any other options for list of Get's. Here we know the exact keys from index 
> table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to