Hi Raymond,

Thanks for starting this discussion.

Agree on 1.. (we may also need some CLI support for inspecting bad/record
and also code samples to consume them etc?)

On 2, these place seem appropriate. We can figure it out, in more detail
when we get to implementation?

On 3. +1 on logs.. We should also define a standard schema for error
record.. I see some tricky issues to handle here, for schema mismatch
errors. For e.g if the core problem was schema mismatch, then
serializing/deserializing the error record without a working schema
specific to that record may not be possible? May be we need the record data
itself in some format like json, that is schemaless?
I also wonder if we should write the error table as another internal
HoodieTable (we are abstracting out HoodieTable, FileGroupIO etc anyway)?

On 4, +1 again.

On Fri, May 22, 2020 at 7:47 PM Shiyan Xu <[email protected]>
wrote:

> Hi all,
>
> I'd like to bring up this discussion around handling errors in Hudi write
> paths.
> https://issues.apache.org/jira/browse/HUDI-648
>
> Trying to gather some feedbacks about the implementation details
> 1. Error location
> I'm thinking of writing the failed records to `.hoodie/errors/` for
> a) encapsulate data within the Hudi table for ease of management
> b) make use of existing dedicated directory
>
> 2. Write path
> org.apache.hudi.client.HoodieWriteClient#postWrite
> org.apache.hudi.client.HoodieWriteClient#completeCompaction
> These 2 methods should be the places to persist failed records in
> `org.apache.hudi.table.action.HoodieWriteMetadata#writeStatuses`
> to the designated location
>
> 3. Format
> Records should be written as logs (avro)
>
> 4. Metric
> Post writing failed records, it should send a metric of basic count of
> errors written. Easier for monitoring system to pick up and send alert.
>
> Foreseeably, some details may need to be adjusted throughout the
> development. To begin with, we may agree on a feasible plan at high level.
>
> Please kindly share thoughts and feedbacks. Thank you.
>
>
>
> Regards,
> Raymond
>

Reply via email to