Possible issue with bounded Read translation using SDF

Ismaël Mejía Fri, 18 Dec 2020 08:16:03 -0800

Hello,

I was trying to profile some pipeline using Java's direct runner. It
reads ~30 60MB text files (CSV). When I started the profiler it
reported more than 40K instances of TextSource being built which
really surprised me given the small size of the data being processed.
I wonder if I found maybe an issue of over-splitting after we moved to
the SDF based translation that may affect simpler uses.


I have not gone deeper or created a JIRA because I wanted to ask here
first maybe to see if there is a 'valid' explanation for so many
'splits'.

Regards,
Ismaël

Possible issue with bounded Read translation using SDF

Reply via email to