Hi, Trying to run terasort with the latest crail (v1.2-rc2-1-g8a739dd) and I’m getting the error below.
(Job aborted due to stage failure: Task 36 in stage 1.0 failed 4 times, most recent failure: Lost task 36.3 in stage 1.0) there is never a getBlock call to that fd (19318) for that task, and I also see that the previous fd(19153) is called 6 times, but with different positions. Is that wrong, as in perhaps the namenode is getting a collision or is stuck? I also only see these tasks (36.x) running on one executor. BTW, I should note that I’m not running with, com.ibm.crail.terasort.sorter.CrailShuffleNativeRadixSorter or com.ibm.crail.terasort.serializer.F22Serializer as I couldn’t get them to run without error. I’m getting a “NYI” assertion error when those are used. Would this matter? 20/01/09 10:34:35 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_36/1-4-35352996 20/01/09 10:34:35 DEBUG crail: RPC: getFile, writeable false 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_36/1-4-35352996, success, fd 19318 20/01/09 10:34:35 INFO crail: CoreInputStream: open, path /spark/shuffle/shuffle_0/part_36/1-4-35352996, fd 19318, streamId 836, isDir false, readHint 4754948 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19153, token 0, position 2097152, capacity 7070730 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19153, token 0, position 3145728, capacity 7070730 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19153, token 0, position 4194304, capacity 7070730 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_54/1-3-35352997 20/01/09 10:34:35 DEBUG crail: RPC: getFile, writeable false 20/01/09 10:34:35 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_54/1-3-35352997, success, fd 19079 20/01/09 10:34:35 INFO crail: CoreInputStream: open, path /spark/shuffle/shuffle_0/part_54/1-3-35352997, fd 19079, streamId 837, isDir false, readHint 7086206 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19079, token 0, position 1048576, capacity 7086206 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19153, token 0, position 5242880, capacity 7070730 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_36/3-1-35352995 20/01/09 10:34:35 DEBUG crail: RPC: getFile, writeable false 20/01/09 10:34:35 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_36/3-1-35352995, success, fd 18715 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: CoreInputStream: open, path /spark/shuffle/shuffle_0/part_36/3-1-35352995, fd 18715, streamId 838, isDir false, readHint 9487318 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19153, token 0, position 6291456, capacity 7070730 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 18715, token 0, position 1048576, capacity 9487318 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19079, token 0, position 2097152, capacity 7086206 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 19079, token 0, position 3145728, capacity 7086206 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 18715, token 0, position 2097152, capacity 9487318 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 DEBUG crail: RPC: getBlock, fd 18715, token 0, position 3145728, capacity 9487318 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: EndpointCache hit /192.168.2.100:4420, fsId 0 20/01/09 10:34:35 INFO crail: lookupDirectory: path /spark/shuffle/shuffle_0/part_55/1-4-35352996 20/01/09 10:34:35 DEBUG crail: RPC: getFile, writeable false 20/01/09 10:34:35 INFO crail: lookup: name /spark/shuffle/shuffle_0/part_55/1-4-35352996, success, fd 19337 20/01/09 10:34:35 INFO crail: CoreInputStream: open, path /spark/shuffle/shuffle_0/part_55/1-4-35352996, fd 19337, streamId 839, isDir false, readHint 4764488 Regards, David C: 714-476-2692