shunping-google commented on PR #29202:
URL: https://github.com/apache/beam/pull/29202#issuecomment-1787479611

   Hi @stankiewicz, thanks so much for contributing to Beam!
   
   Per our discussion, I would recommend the following additional changes.
   1. Overloading the public functions to avoid breaking changes to the existed 
apis. (I see you have already submitted a new commit for that. Thanks for doing 
it!)
   2. Your current code works great when the number of header lines to skip is 
1. Would you please also add the logic for the case when we need to skip 
multiple header lines? Our Python SDK has a similar code path that you may find 
it useful.
   
https://github.com/apache/beam/blob/c7dc5bd3d0d5388c4417bc7f5fbea2b0cf93e02f/sdks/python/apache_beam/io/textio.py#L113
   Notice that in this scenario, **each** reader will have to skip the header 
lines and make sure their start position is larger than the last position of 
the header. This may hurt the performance. Therefore, we will keep the all the 
code paths (number of header lines to skip = 0/None, 1, >1) and choose the most 
efficient one based on user input. 
   3. Last but not least, I suggest using the same parameter name as our Python 
API, i.e.`skipHeaderLines`.
   
   Again, great work!
   
   Regards,
   
   Shunping


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to