Update:

in 20 mins the tokenization stage is complete But its not evident in the
online UI.
I found it by checking the s3 output folder

2010-02-27 21:50  2696826329
s3://robinanil/wikipedia/tokenized-documents/part-00000
2010-02-27 21:52  2385184391
s3://robinanil/wikipedia/tokenized-documents/part-00001
2010-02-27 21:52  2458566158
s3://robinanil/wikipedia/tokenized-documents/part-00002
2010-02-27 21:53  2500213973
s3://robinanil/wikipedia/tokenized-documents/part-00003
2010-02-27 21:50  2533593862
s3://robinanil/wikipedia/tokenized-documents/part-00004
2010-02-27 21:54  3580695441
s3://robinanil/wikipedia/tokenized-documents/part-00005
2010-02-27 22:02         0
s3://robinanil/wikipedia/tokenized-documents_$folder$
2010-02-27 22:02         0
s3://robinanil/wikipedia/wordcount/subgrams/_temporary_$folder$
2010-02-27 22:02         0
s3://robinanil/wikipedia/wordcount/subgrams_$folder$
2010-02-27 22:02         0   s3://robinanil/wikipedia/wordcount_$folder$

Reply via email to