Siddhi Mehta created PHOENIX-2367:
-------------------------------------
Summary: Change PhoenixRecordWriter to use execute instead of
executeBatch
Key: PHOENIX-2367
URL: https://issues.apache.org/jira/browse/PHOENIX-2367
Project: Phoenix
Issue Type: Improvement
Reporter: Siddhi Mehta
Assignee: Siddhi Mehta
Hey All,
I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage
similar to how the CSVBulkLoad tool has an option of ignoring the bad rows.I
did some work on the apache pig code that allows Storers to have a notion of
Customizable/Configurable Errors PIG-4704.
I wanted to plug this behavior for PhoenixHbaseStorage and propose certain
changes for the same.
Current Behavior/Problem:
PhoenixRecordWriter makes use of executeBatch() to process rows once batch size
is reached. If there are any client side validation/syntactical errors like
data not fitting the column size, executeBatch() throws an exception and there
is no-way to retrieve the valid rows from the batch and retry them. We discard
the whole batch or fail the job without errorhandling.
With auto commit set to false execute() also servers the purpose of not making
any rpc calls but does a bunch of validation client side and adds it to the
client cache of mutation.
On conn.commit() we make a rpc call.
Proposed Change
To be able to use Configurable ErrorHandling and ignore only the failed records
instead of discarding the whole batch I want to propose changing the behavior
in PhoenixRecordWriter from execute to executeBatch() or having a configuration
to toggle between the 2 behaviors
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)