+1 on making this change. Can you file a JIRA for it?

On Mon, Nov 2, 2015 at 4:31 PM, Siddhi Mehta <sm26...@gmail.com> wrote:

> Hey All,
>
> I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage
> similar to how the CSVBulkLoad tool has an option of ignoring the bad
> rows.I did some work on the apache pig code that allows Storers to have a
> notion of Customizable/Configurable Errors PIG-4704
> <https://issues.apache.org/jira/browse/PIG-4704>.
>
> I wanted to plug this behavior for PhoenixHbaseStorage and propose certain
> changes for the same.
>
> *Current Behavior/Problem:*
>
> PhoenixRecordWriter makes use of executeBatch() to process rows once batch
> size is reached. If there are any client side validation/syntactical errors
> like data not fitting the column size, executeBatch() throws an exception
> and there is no-way to retrieve the valid rows from the batch and retry
> them. We discard the whole batch or fail the job without errorhandling.
>
> With auto commit set to false execute() also servers the purpose of not
> making any rpc calls  but does a bunch of validation client side and adds
> it to the client cache of mutation.
>
> On conn.commit() we make a rpc call.
>
> *Proposed Change*
>
> To be able to use Configurable ErrorHandling and ignore only the failed
> records instead of discarding the whole batch I want to propose changing
> the behavior in PhoenixRecordWriter from execute to executeBatch() or
> having a configuration to toggle between the 2 behaviors
> Thoughts?
>

Reply via email to