Open the driver ui and see which stage is taking time, you can look whether
its adding any GC time etc.

Thanks
Best Regards

On Thu, Apr 16, 2015 at 9:56 PM, Jeetendra Gangele <gangele...@gmail.com>
wrote:

> Hi All I have below code whether distinct is running for more time.
>
> blockingRdd is the combination of <Long,String> and it will have 400K
> records
> JavaPairRDD<Long,Integer> completeDataToprocess=blockingRdd.flatMapValues(
> new Function<String, Iterable<Integer>>(){
>
> @Override
> public Iterable<Integer> call(String v1) throws Exception {
> return ckdao.getSingelkeyresult(v1);
> }
>  }).distinct(32);
>
> I am running distinct on 800K records and its taking 2 hours on 16 cores
> and 20 GB RAM.
>

Reply via email to