[
https://issues.apache.org/jira/browse/HBASE-26183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396723#comment-17396723
]
Michael Stack commented on HBASE-26183:
---------------------------------------
You have a bound on the size of the JSON object? Do you have to fetch all in
one go? The OOME is on the client-side or on server-side?
Also, this sort of query is best suited to the user mailing list rather than
here in JIRA. Mind posting there? Thanks.
> Size of the Result object while querying huge data from HBASE table
> -------------------------------------------------------------------
>
> Key: HBASE-26183
> URL: https://issues.apache.org/jira/browse/HBASE-26183
> Project: HBase
> Issue Type: New Feature
> Components: scan
> Affects Versions: 1.1.13
> Reporter: shriram
> Priority: Major
> Labels: performance
>
>
> I am trying to query hbase table with rowkeys. We have the following structure
> * index table which has rowkeys of the actual table
> * actual table which contains json data in compressed format.
> When i am trying to query hbase, i have to scan first index table for rowkeys
> using scan with some filters which will results to byte array.(row keys).
> Once we obtained rowkeys, we are invoking listofGets() in Table object. Once
> obtained we are iterating the object and prepare a list which contains
> compressed json objects. Here we are not sure about the size and number of
> the objects. In case of number of objects is huge we may result in OOM. Do we
> have any options to return Iterator or buffering the results so that we can
> avoid OOM.
> {{for (byte[] rowkey : indexTableOutput)
> { Get get = new
> Get(rowkey).addFamily(Bytes.toBytes(columnFamilty)).setMaxVersions(MAX_VERSIONS);
> listOfget.add(get);
> }}}
> The above piece of code which is used to retrieve the keys from index table.
> {{TableName tableName = TableName.valueOf("table1");Table tableObj =
> conn.getTable(tableName);
> Result[] results = tableObj.get(listOfget);}}
> From the above piece of code we have few queries. Any help would be
> appreciated.
> * If we have a huge number of data, Result[] will contain all the results?
> * How to return a iterator kind of object so that we can leave it to
> consumer because keeping all the data and doing processing will result in OOM
> * Any other options to return a limited data so that consumer do processing
> and continue
> I could find a resultscanner is returning for scan objects. But couldn't find
> any other options for list of Get's. Here we know the exact keys from index
> table.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)