iemejia commented on pull request #14755:
URL: https://github.com/apache/beam/pull/14755#issuecomment-837026342


   > Is there any typical source where you notice the regression? 
   
   We have received reports on multiple sources but in particular on bounded 
file based sources. I was surprised because I noticed this even with ParquetIO 
(that is based on a SDF based implementation) and by simply adding 
`--experiments=use_deprecated_read` we won back a performance benefit of at 
least 20% on our tests benchmarks (TPC-DS query 3 on Spark runner).
   
   You can easily reproduce this by running pipelines with big amounts of data, 
the difference with tiny data is really low to even notice and even sometimes 
better for SDF (e.g. in the Nexmark CI tests).
   
   I tried to reproduce the bounded performance regression on Dataflow but in 
Dataflow I do not see any considerable consistent performance difference by 
using or not `use_deprecated_read`.
   
   > And SDF read is default only for Spark Streaming, I'm curious about what 
kind of performance we are talking about here. Is it the throughput per second 
or watermark lag?
   
   Spark Streaming is still using the Read.Unbounded path, this has not yet 
been migrated to SDF. Luke was working on this but this was not finished when 
he left, for ref #13101
   
   The regression for unbounded reads (the perceived delay to get messages) on 
Direct Runner reported by @steveniemitz is probably a sufficient reason to 
revert SDF by default for direct runner and if we add the performance issues 
[reported on 
Flink](https://the-asf.slack.com/archives/C9H0YNP3P/p1607057900393900) too  I 
think we must return everything back to the traditional Read based translation 
until we have consistent results.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to