[jira] [Created] (HBASE-14696) allowPartialResults in mapreduce Mappers

Mindaugas Kairys (JIRA) Sun, 25 Oct 2015 23:35:37 -0700

Mindaugas Kairys created HBASE-14696:
----------------------------------------


             Summary: allowPartialResults in mapreduce Mappers
                 Key: HBASE-14696
                 URL: https://issues.apache.org/jira/browse/HBASE-14696
             Project: HBase
          Issue Type: Improvement
          Components: mapreduce
    Affects Versions: 1.1.0, 2.0.0
            Reporter: Mindaugas Kairys
            Assignee: Ted Yu


It is currently impossible to get partial results in mapreduce mapper jobs.

When setting setAllowPartialResults(true) for scan jobs, they still fail with 
OOME on large rows.

The reason is that Scan field allowPartialResults is lost during job creation:

  1. User creates a Job and sets a scan object via 
TableMapReduceUtil.initTableMapperJob(table_name, scanObj,...) -> which puts a 
result of TableMapReduceUtil.convertScanToString(scanObj) to the job config.

  2. When the job starts - method TableInputFormat.setConfig retrieves a scan 
string from config and converts it to Scan object by calling 
TableMapReduceUtil.convertStringToScan - which results in a Scan object with a 
field allowPartialResults always set to false.

I have tried to experiment and modify a TableInputFormat method setConfig() by 
forcing all scans to allow partial results and after this all jobs succeeded 
with no more OOME and I also noticed that mappers began to get partial results 
(Result.isPartial()).

My use case is very simple - I just have large rows and expect a mapper to get 
them partially - to get same rowid several times with different key/value 
records.
This would allow me not to worry about implementing my own result partitioning 
solution, which i would encounter in case the big amount of result key values 
could be transparently returned for a single large row.
And from the other side - if a Scan object can return several records for the 
same rowid (partial results), perhaps the mapper should do the same.







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14696) allowPartialResults in mapreduce Mappers

Reply via email to