[ https://issues.apache.org/jira/browse/SPARK-21066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
darion yaphet updated SPARK-21066: ---------------------------------- Description: Currently when we using SVM to train dataset we found the input files limit only one . The file store on the Distributed File System such as HDFS is split into mutil piece and I think this limit is not necessary . We can join input paths into a string split with comma. was: Currently when we using SVM to train dataset we found the input files limit only one . the source code as following : {{{ val path = if (dataFiles.length == 1) { dataFiles.head.getPath.toUri.toString } else if (dataFiles.isEmpty) { throw new IOException("No input path specified for libsvm data") } else { throw new IOException("Multiple input paths are not supported for libsvm data.") } }}} The file store on the Distributed File System such as HDFS is split into mutil piece and I think this limit is not necessary . We can join input paths into a string split with comma. > LibSVM load just one input file > ------------------------------- > > Key: SPARK-21066 > URL: https://issues.apache.org/jira/browse/SPARK-21066 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.1.1 > Reporter: darion yaphet > > Currently when we using SVM to train dataset we found the input files limit > only one . > The file store on the Distributed File System such as HDFS is split into > mutil piece and I think this limit is not necessary . > We can join input paths into a string split with comma. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org