Jacob Getto created HBASE-28181:
-----------------------------------
Summary: Support overriding the HFileOutputFormat2 RecordWriter's
outputDir
Key: HBASE-28181
URL: https://issues.apache.org/jira/browse/HBASE-28181
Project: HBase
Issue Type: Improvement
Reporter: Jacob Getto
Assignee: Jacob Getto
Currently when the HFileOutputFormat2 creates a RecordWriter, it derives the
output directory from the OutputCommitter. This is generally the desired
behavior, but we have a use case where it would be helpful to explicitly
override it.
The use-case is essentially a variant of the MultiTableHFileOutputFormat
behavior. But rather than having one job that output data for a number of
tables, we would like to have one job subdivide its output so that it can be
bulk-loaded over the course of multiple distcp+LoadIncrementalHFiles runs. This
is helpful when an individual table's data needs to be sent to multiple
clusters, or for when loading the data in chunks is needed for reliability and
stability.
Adding a path override configuration option to the MultiTableHFileOutputFormat
would allow us to extend the output format and create multiple RecordWriters,
each configured with separate output directories.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)