Owen,

Thank you for your answer and advices.
I will publish the patch if I can do something acceptable.

Best regards,
Aleksey Yatsenko.

вт, 3 мар. 2020 г. в 19:11, Owen O'Malley <[email protected]>:

>
>
> On Tue, Mar 3, 2020 at 6:43 AM Aleksey Yatsenko <
> [email protected]> wrote:
>
>> Hello,
>>
>> First of all, I would like to thank you and your colleagues for ORC
>> library, and sorry for direct message.
>>
>
> You're welcome. I hope it is ok that I'm cc'ing the ORC dev list.
>
> I am plan to create ORC files using C++ API and I found out that the file
>> may contains stripes which cross the HDFS block boundaries. There are no
>> corresponding configuration parameters in WriterOptions class and no
>> required logic in WriterImpl::add() ( the same as in padStripe() in
>> PhysicalFsWriter.java ).
>> Could you please clarify whether this functionality will ever be
>> implemented or give a couple of tips on how to do it myself :).
>>
>
> I haven't heard of anyone implementing HDFS block padding on the C++ side.
> It should be relatively easy for you to add, especially if you use the Java
> code as an example. As the writer finishes the stripe, you can calculate
> how many bytes the finished stripe will be and from there figure out the
> number of padding bytes if required. Do make sure that the padding doesn't
> exceed a configured threshold to avoid corner cases with more padding than
> stripe.
>
> Please do contribute it back to the project.
>
> I also found out that the libhdfspp does not support the "writing" and the
>> "short circuit reads" functionality. I plan to use libhdfs3 from Apache
>> HAWQ project which promises both of these features.
>>
>
> There is always more work to be done.
>
> Thanks,
>    Owen O'Malley
>

Reply via email to