Jacob Getto created HBASE-28181:
-----------------------------------

             Summary: Support overriding the HFileOutputFormat2 RecordWriter's 
outputDir
                 Key: HBASE-28181
                 URL: https://issues.apache.org/jira/browse/HBASE-28181
             Project: HBase
          Issue Type: Improvement
            Reporter: Jacob Getto
            Assignee: Jacob Getto


Currently when the HFileOutputFormat2 creates a RecordWriter, it derives the 
output directory from the OutputCommitter. This is generally the desired 
behavior, but we have a use case where it would be helpful to explicitly 
override it.

The use-case is essentially a variant of the MultiTableHFileOutputFormat 
behavior. But rather than having one job that output data for a number of 
tables, we would like to have one job subdivide its output so that it can be 
bulk-loaded over the course of multiple distcp+LoadIncrementalHFiles runs. This 
is helpful when an individual table's data needs to be sent to multiple 
clusters, or for when loading the data in chunks is needed for reliability and 
stability. 

Adding a path override configuration option to the MultiTableHFileOutputFormat 
would allow us to extend the output format and create multiple RecordWriters, 
each configured with separate output directories. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to