Owen, Thank you for your answer and advices. I will publish the patch if I can do something acceptable.
Best regards, Aleksey Yatsenko. вт, 3 мар. 2020 г. в 19:11, Owen O'Malley <[email protected]>: > > > On Tue, Mar 3, 2020 at 6:43 AM Aleksey Yatsenko < > [email protected]> wrote: > >> Hello, >> >> First of all, I would like to thank you and your colleagues for ORC >> library, and sorry for direct message. >> > > You're welcome. I hope it is ok that I'm cc'ing the ORC dev list. > > I am plan to create ORC files using C++ API and I found out that the file >> may contains stripes which cross the HDFS block boundaries. There are no >> corresponding configuration parameters in WriterOptions class and no >> required logic in WriterImpl::add() ( the same as in padStripe() in >> PhysicalFsWriter.java ). >> Could you please clarify whether this functionality will ever be >> implemented or give a couple of tips on how to do it myself :). >> > > I haven't heard of anyone implementing HDFS block padding on the C++ side. > It should be relatively easy for you to add, especially if you use the Java > code as an example. As the writer finishes the stripe, you can calculate > how many bytes the finished stripe will be and from there figure out the > number of padding bytes if required. Do make sure that the padding doesn't > exceed a configured threshold to avoid corner cases with more padding than > stripe. > > Please do contribute it back to the project. > > I also found out that the libhdfspp does not support the "writing" and the >> "short circuit reads" functionality. I plan to use libhdfs3 from Apache >> HAWQ project which promises both of these features. >> > > There is always more work to be done. > > Thanks, > Owen O'Malley >
