Hello,

I was trying to profile some pipeline using Java's direct runner. It
reads ~30 60MB text files (CSV). When I started the profiler it
reported more than 40K instances of TextSource being built which
really surprised me given the small size of the data being processed.
I wonder if I found maybe an issue of over-splitting after we moved to
the SDF based translation that may affect simpler uses.

I have not gone deeper or created a JIRA because I wanted to ask here
first maybe to see if there is a 'valid' explanation for so many
'splits'.

Regards,
Ismaël

Reply via email to