Thanks for the description.  It seemed odd that it behaved this way as HDFS 
does close as

expected, so I wasn’t sure. Wouldn’t this change the Terasort benchmark numbers?



Regards,



           David



C: 714-476-2692



________________________________
From: Jonas Pfefferle <peppe...@japf.ch>
Sent: Wednesday, June 19, 2019 12:17:30 AM
To: dev@crail.apache.org; David Crespi; d...@crail.incubator.apache.org
Subject: Re: Crail used as type 2 storage for TeraSort does not catch the 
"finished" signal

Hi David,


Unfortunately, if you use Crail for input/output with Spark this is
expected. The problem is Spark never closes the filesystem correctly. I
haven't look into this lately but if I remember correctly there was no easy
way otherwise to determine Spark is about to close.

Regards,
Jonas

  On Tue, 18 Jun 2019 22:17:16 +0000
  David Crespi <david.cre...@storedgesystems.com> wrote:
> Hi,
> I’m running Crail as the temporary backend storage for Terasort.
> After each section (TeraGen, TeraSort, TeraVerify)
> the program waits until a Cntl-C is given, then moves on to the next
>section.  Is this the expected behavior, or is
> this a bug?
>
> Here’s a small snippet of the output.  Terasort waits where the
>bolded “Number of records” is listed, until
> The ^c is given.  Each of the three programs does the same, but the
>program does finish without errors.
>
>
> 19/06/18 15:13:19 DEBUG TaskSchedulerImpl: parentName: , name:
>TaskSet_1.0, runningTasks: 1
> 19/06/18 15:13:19 INFO TaskSetManager: Finished task 1.0 in stage
>1.0 (TID 3) in 142 ms on 192.168.3.10 (executor 4) (1/2)
> 19/06/18 15:13:19 INFO BlockManagerInfo: Added broadcast_1_piece0 in
>memory on 192.168.3.12:34011 (size: 1825.0 B, free: 366.3 MB)
> 19/06/18 15:13:19 DEBUG TaskSchedulerImpl: parentName: , name:
>TaskSet_1.0, runningTasks: 0
> 19/06/18 15:13:19 INFO TaskSetManager: Finished task 0.0 in stage
>1.0 (TID 2) in 977 ms on 192.168.3.12 (executor 3) (2/2)
> 19/06/18 15:13:19 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose
>tasks have all completed, from pool
> 19/06/18 15:13:19 INFO DAGScheduler: ResultStage 1 (count at
>TeraGen.scala:94) finished in 0.995 s
> 19/06/18 15:13:19 DEBUG DAGScheduler: After removal of stage 1,
>remaining stages = 0
> 19/06/18 15:13:19 INFO DAGScheduler: Job 1 finished: count at
>TeraGen.scala:94, took 1.003537 s
> Number of records written: 10000
> ^C19/06/18 15:13:36 INFO SparkContext: Invoking stop() from shutdown
>hook
>
> Regards,
>
>           David
>
>

Reply via email to