Paul Mazak created PIG-4663:
-------------------------------

             Summary: HBaseStorage could allow a row columns limit to avoid 
memory or scan timeout issues
                 Key: PIG-4663
                 URL: https://issues.apache.org/jira/browse/PIG-4663
             Project: Pig
          Issue Type: Improvement
            Reporter: Paul Mazak


The HBase client Scan API offers a way to setMaxResultsPerColumnFamily.  This 
number prevents all the columns from being consumed when scanning a row.  If 
you have a single row with several thousand columns on it, Pig will likely fail 
giving an OutOfMemoryException or ScannerTimeoutException.

The suggestion is to add the option '-maxResultsPerColumnFamily' which can be 
passed as an optString parameter in the constructor, which sets this value on 
the HBase Scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to