Hi guys,

We are trying to run our pipeline using direct runner and the input dataset
is a large amount of HDFS files (few hundred of GB data)

We experienced OOM issue crash. Then inside the direct runner document, I
realized direct runner loads the whole dataset into the memory.

Is there any way we can avoid this OOM issue?

Regards

-------------------------------------------------------------

Wilson(Xiaoshuang) Wang
Sr. Software Engineer

Reply via email to