Paul Mazak created PIG-4663:
-------------------------------
Summary: HBaseStorage could allow a row columns limit to avoid
memory or scan timeout issues
Key: PIG-4663
URL: https://issues.apache.org/jira/browse/PIG-4663
Project: Pig
Issue Type: Improvement
Reporter: Paul Mazak
The HBase client Scan API offers a way to setMaxResultsPerColumnFamily. This
number prevents all the columns from being consumed when scanning a row. If
you have a single row with several thousand columns on it, Pig will likely fail
giving an OutOfMemoryException or ScannerTimeoutException.
The suggestion is to add the option '-maxResultsPerColumnFamily' which can be
passed as an optString parameter in the constructor, which sets this value on
the HBase Scan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)