Mindaugas Kairys created HBASE-14696:
----------------------------------------
Summary: allowPartialResults in mapreduce Mappers
Key: HBASE-14696
URL: https://issues.apache.org/jira/browse/HBASE-14696
Project: HBase
Issue Type: Improvement
Components: mapreduce
Affects Versions: 1.1.0, 2.0.0
Reporter: Mindaugas Kairys
Assignee: Ted Yu
It is currently impossible to get partial results in mapreduce mapper jobs.
When setting setAllowPartialResults(true) for scan jobs, they still fail with
OOME on large rows.
The reason is that Scan field allowPartialResults is lost during job creation:
1. User creates a Job and sets a scan object via
TableMapReduceUtil.initTableMapperJob(table_name, scanObj,...) -> which puts a
result of TableMapReduceUtil.convertScanToString(scanObj) to the job config.
2. When the job starts - method TableInputFormat.setConfig retrieves a scan
string from config and converts it to Scan object by calling
TableMapReduceUtil.convertStringToScan - which results in a Scan object with a
field allowPartialResults always set to false.
I have tried to experiment and modify a TableInputFormat method setConfig() by
forcing all scans to allow partial results and after this all jobs succeeded
with no more OOME and I also noticed that mappers began to get partial results
(Result.isPartial()).
My use case is very simple - I just have large rows and expect a mapper to get
them partially - to get same rowid several times with different key/value
records.
This would allow me not to worry about implementing my own result partitioning
solution, which i would encounter in case the big amount of result key values
could be transparently returned for a single large row.
And from the other side - if a Scan object can return several records for the
same rowid (partial results), perhaps the mapper should do the same.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)