Konstantin created MAHOUT-1487:
----------------------------------

             Summary: More understandable error message when attempt to use 
wrong FileSystem
                 Key: MAHOUT-1487
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1487
             Project: Mahout
          Issue Type: Improvement
          Components: Clustering
    Affects Versions: 0.9
         Environment: Amazon S3, Amazon EMR, Local file system
            Reporter: Konstantin
            Priority: Trivial
             Fix For: 1.0


RandomSeedGenerator has following code:
FileSystem fs = FileSystem.get(output.toUri(), conf);
...
fs.getFileStatus(input).isDir() 

If specify output path correctly and input path not correctly, Mahout throws 
not well understandable error message. "Exception in thread "main" 
java.lang.IllegalArgumentException: This file system object 
(hdfs://172.31.41.65:9000) does not support access to the request path 
's3://by.kslisenko.bigdata/stackovweflow-small/out_new/sparse/tfidf-vectors' 
You possibly called FileSystem.get(conf) when you should have called 
FileSystem.get(uri, conf) to obtain a file system supporting your path"

This happens because FileSystem object was created from output path, and 
getFileStatus has parameter for input path. This caused misunderstanding when 
try to understand what error message means.

To prevent this misunderstanding, I propose to improve error message adding 
following details:
1. Specify which filesystem type used (DistributedFileSystem, 
NativeS3FileSystem, etc. using fs.getClass().getName())
2. Then specify which path can not be processed correctly.

This can be done by validation utility which can be applied to many places in 
Mahout. When we use Mahout we need to specify many paths and we also can use 
many types of file systems: local for debugging, distributed on Hadoop, and s3 
on Amazon. In this case better error messages can save much time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to