[
https://issues.apache.org/jira/browse/SPARK-34883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329132#comment-17329132
]
Brady Tello edited comment on SPARK-34883 at 4/22/21, 1:52 PM:
---------------------------------------------------------------
[~Vikas_Yadav]
You don't have a colon in your `inputFile` path. See the following code that
contains a colon in the path to the dataset. It fails with a
URISyntaxException as expected:
{code:java}
>>> inputFile = "/Users/home_dir/Workspaces/datasets/with:colon/iris.csv"
>>> tempDF = spark.read.csv(inputFile, multiLine=True)
Traceback (most recent call last): File "<stdin>", line 1, in <module> File
"/Users/home_dir/Workspaces/spark/python/pyspark/sql/readwriter.py", line 737,
in csv return
self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path))) File
"/Users/home_dir/Workspaces/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py",
line 1304, in __call__ File
"/Users/home_dir/Workspaces/spark/python/pyspark/sql/utils.py", line 117, in
deco raise converted from None pyspark.sql.utils.IllegalArgumentException:
java.net.URISyntaxException: Relative path in absolute URI: with:colon
{code}
was (Author: bctello8):
[~Vikas_Yadav]
You don't have a colon in your `inputFile` path. See the following code that
contains a colon in the path to the dataset. It fails with a
URISyntaxException as expected:
{code:java}
>>> inputFile = "/Users/home_dir/Workspaces/datasets/with:colon/iris.csv"
>>> tempDF = spark.read.csv(inputFile, multiLine=True)
Traceback (most recent call last): File "<stdin>", line 1, in <module> File
"/Users/brady.tello/Workspaces/spark/python/pyspark/sql/readwriter.py", line
737, in csv return
self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path))) File
"/Users/brady.tello/Workspaces/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py",
line 1304, in __call__ File
"/Users/brady.tello/Workspaces/spark/python/pyspark/sql/utils.py", line 117, in
deco raise converted from None pyspark.sql.utils.IllegalArgumentException:
java.net.URISyntaxException: Relative path in absolute URI: with:colon
{code}
> Setting CSV reader option "multiLine" to "true" causes URISyntaxException
> when colon is in file path
> ----------------------------------------------------------------------------------------------------
>
> Key: SPARK-34883
> URL: https://issues.apache.org/jira/browse/SPARK-34883
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.0.0, 3.1.1
> Reporter: Brady Tello
> Priority: Major
>
> Setting the CSV reader's "multiLine" option to "True" throws the following
> exception when a ':' character is in the file path.
>
> {code:java}
> java.net.URISyntaxException: Relative path in absolute URI: test:dir
> {code}
> I've tested this in both Spark 3.0.0 and Spark 3.1.1 and I get the same error
> whether I use Scala, Python, or SQL.
> The following code works fine:
>
> {code:java}
> csvFile = "/FileStore/myDir/test:dir/pageviews_by_second.tsv"
> tempDF = (spark.read.option("sep", "\t").csv(csvFile)
> {code}
> While the following code fails:
>
> {code:java}
> csvFile = "/FileStore/myDir/test:dir/pageviews_by_second.tsv"
> tempDF = (spark.read.option("sep", "\t").option("multiLine",
> "True").csv(csvFile)
> {code}
> Full Stack Trace from Python:
>
> {code:java}
> ---------------------------------------------------------------------------
> IllegalArgumentException Traceback (most recent call last) <command-8965899>
> in <module>
> 3 csvFile = "/FileStore/myDir/test:dir/pageviews_by_second.tsv"
> 4
> ----> 5 tempDF = (spark.read.option("sep", "\t").option("multiLine", "True")
> /databricks/spark/python/pyspark/sql/readwriter.py in csv(self, path, schema,
> sep, encoding, quote, escape, comment, header, inferSchema,
> ignoreLeadingWhiteSpace, ignoreTrailingWhiteSpace, nullValue, nanValue,
> positiveInf, negativeInf, dateFormat, timestampFormat, maxColumns,
> maxCharsPerColumn, maxMalformedLogPerPartition, mode,
> columnNameOfCorruptRecord, multiLine, charToEscapeQuoteEscaping,
> samplingRatio, enforceSchema, emptyValue, locale, lineSep, pathGlobFilter,
> recursiveFileLookup, modifiedBefore, modifiedAfter, unescapedQuoteHandling)
> 735 path = [path]
> 736 if type(path) == list:
> --> 737 return
> self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path)))
> 738 elif isinstance(path, RDD):
> 739 def func(iterator):
> /databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in
> __call__(self, *args)
> 1302
> 1303 answer = self.gateway_client.send_command(command)
> -> 1304 return_value = get_return_value(
> 1305 answer, self.gateway_client, self.target_id, self.name)
> 1306
> /databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
> 114 # Hide where the exception came from that shows a non-Pythonic
> 115 # JVM exception message.
> --> 116 raise converted from None
> 117 else:
> 118 raise IllegalArgumentException: java.net.URISyntaxException: Relative
> path in absolute URI: test:dir
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]