Hello, I was trying to profile some pipeline using Java's direct runner. It reads ~30 60MB text files (CSV). When I started the profiler it reported more than 40K instances of TextSource being built which really surprised me given the small size of the data being processed. I wonder if I found maybe an issue of over-splitting after we moved to the SDF based translation that may affect simpler uses.
I have not gone deeper or created a JIRA because I wanted to ask here first maybe to see if there is a 'valid' explanation for so many 'splits'. Regards, Ismaël
