Re: PhoenixHbaseStorage to Skip invalid rows

Siddhi Mehta Tue, 03 Nov 2015 23:04:58 -0800

Jira Created: https://issues.apache.org/jira/browse/PHOENIX-2367
I will submit a patch for review soon.


On Tue, Nov 3, 2015 at 10:52 AM, Jan Fernando <jferna...@salesforce.com>
wrote:

> +1 on making this change. Can you file a JIRA for it?
>
> On Mon, Nov 2, 2015 at 4:31 PM, Siddhi Mehta <sm26...@gmail.com> wrote:
>
> > Hey All,
> >
> > I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage
> > similar to how the CSVBulkLoad tool has an option of ignoring the bad
> > rows.I did some work on the apache pig code that allows Storers to have a
> > notion of Customizable/Configurable Errors PIG-4704
> > <https://issues.apache.org/jira/browse/PIG-4704>.
> >
> > I wanted to plug this behavior for PhoenixHbaseStorage and propose
> certain
> > changes for the same.
> >
> > *Current Behavior/Problem:*
> >
> > PhoenixRecordWriter makes use of executeBatch() to process rows once
> batch
> > size is reached. If there are any client side validation/syntactical
> errors
> > like data not fitting the column size, executeBatch() throws an exception
> > and there is no-way to retrieve the valid rows from the batch and retry
> > them. We discard the whole batch or fail the job without errorhandling.
> >
> > With auto commit set to false execute() also servers the purpose of not
> > making any rpc calls  but does a bunch of validation client side and adds
> > it to the client cache of mutation.
> >
> > On conn.commit() we make a rpc call.
> >
> > *Proposed Change*
> >
> > To be able to use Configurable ErrorHandling and ignore only the failed
> > records instead of discarding the whole batch I want to propose changing
> > the behavior in PhoenixRecordWriter from execute to executeBatch() or
> > having a configuration to toggle between the 2 behaviors
> > Thoughts?
> >
>

Re: PhoenixHbaseStorage to Skip invalid rows

Reply via email to