Hi
Finally I found the reason...
It caused by some long time gc on some datanodes. After receiving the data
from executors, the data node with long gc cannot report blocks to
namenode, so the writing progress takes a long time.
Now I have decommissioned the broken data nodes, and now my spark runs
Hi,
In my spark batch job,
step 1: the driver assigns a partition of json file path list to each
executor.
step 2: each executor gets these assigned json files from S3 and save into
hdfs.
step 3: the driver read these json files into a data frame and save into
parquet.
To improve performance by
Hi all:
In the `mapWithState`method in spark streaming, you need to pass in an
anonymous function. This function maintains a state and should return a
result. It can be said that the final stateful result can be obtained from the
state object.
So, what is the significance of