shunping-google commented on PR #29202: URL: https://github.com/apache/beam/pull/29202#issuecomment-1787479611
Hi @stankiewicz, thanks so much for contributing to Beam! Per our discussion, I would recommend the following additional changes. 1. Overloading the public functions to avoid breaking changes to the existed apis. (I see you have already submitted a new commit for that. Thanks for doing it!) 2. Your current code works great when the number of header lines to skip is 1. Would you please also add the logic for the case when we need to skip multiple header lines? Our Python SDK has a similar code path that you may find it useful. https://github.com/apache/beam/blob/c7dc5bd3d0d5388c4417bc7f5fbea2b0cf93e02f/sdks/python/apache_beam/io/textio.py#L113 Notice that in this scenario, **each** reader will have to skip the header lines and make sure their start position is larger than the last position of the header. This may hurt the performance. Therefore, we will keep the all the code paths (number of header lines to skip = 0/None, 1, >1) and choose the most efficient one based on user input. 3. Last but not least, I suggest using the same parameter name as our Python API, i.e.`skipHeaderLines`. Again, great work! Regards, Shunping -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
