[jira] [Updated] (LIVY-322) JsonParseException on failure to parse text output from subprocess call to hadoop fs -rm

Gyorgy Gal (Jira) Mon, 10 Nov 2025 09:15:03 -0800


     [ 
https://issues.apache.org/jira/browse/LIVY-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gyorgy Gal updated LIVY-322:
----------------------------
    Fix Version/s: 0.10.0
                       (was: 0.9.0)

This issue has been moved to the 0.10.0 release as part of a bulk update. If 
you feel this is moved out inappropriately, feel free to provide justification 
and reset the Fix Version to 0.9.0.

> JsonParseException on failure to parse text output from subprocess call to 
> hadoop fs -rm
> ----------------------------------------------------------------------------------------
>
>                 Key: LIVY-322
>                 URL: https://issues.apache.org/jira/browse/LIVY-322
>             Project: Livy
>          Issue Type: Bug
>          Components: API, Interpreter
>    Affects Versions: 0.3
>            Reporter: Rick Bernotas
>            Priority: Major
>             Fix For: 0.10.0
>
>         Attachments: patch_LIVY-322_rickbernotas.patch
>
>
> In a pyspark session, if you run a subprocess.call() to do a "hadoop fs -rm" 
> on a Hadoop 2.7 cluster, the response from the "hadoop fs -rm" (a text 
> response that it has moved the file to the .Trash folder in HDFS) will cause 
> a JsonParseException in Livy, and then all following statement executions in 
> the session will fail to work right.
> I suspect there is something in the response from the hadoop fs that is 
> tripping up Livy in the conversion to Json, perhaps a reserved or special 
> character in the response that Livy is not filtering out, as the response is 
> otherwise innocuous.
> Livy needs to correctly parse the response and not throw an exception, and 
> also in the case that an exception is thrown, the session should be able to 
> recover from the exception to continue running statements correctly.  
> Following the Json Exception, even a print(1) statement fails to execute 
> properly, necessitating the user get a new session to work with.
> Example follows below.
> {code:java}
> ### CREATE A NEW PYSPARK SESSION
> -bash-4.1$ curl -X POST --data '{"kind": "pyspark"}' -H "Content-Type: 
> application/json" localhost:8998/sessions
> {"id":2,"appId":null,"owner":null,"proxyUser":null,"state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}
> ### CHECK THE STATE OF SESSION 2 UNTIL IT GOES FROM "STARTING" STATE TO 
> "IDLE" STATE
> -bash-4.1$ curl localhost:8998/sessions/2
> {"id":2,"appId":null,"owner":null,"proxyUser":null,"state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}
> -bash-4.1$ curl localhost:8998/sessions/2
> {"id":2,"appId":null,"owner":null,"proxyUser":null,"state":"idle","kind":"pyspark","appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}
> ### RUN THE PYSPARK CODE IN SESSION 2, "import subprocess"
> -bash-4.1$ curl localhost:8998/sessions/2/statements -X POST -H 
> 'Content-Type: application/json' -d '{"code":"import subprocess"}'
> {"id":0,"state":-X POST --data '{"kind": "pyspark"}' -H "Content-Type: 
> application/json" localhost:8998/sessions
> ### GET THE OUTPUT OF THE CODE JUST RUN IN SESSION 2
> -bash-4.1$ curl localhost:8998/sessions/2/statements/0
> {"id":0,"state":"available","output":{"status":"ok","execution_count":0,"data":{"text/plain":""}}}
> ### THE OUTPUT IS {"text/plain":""} WHICH IS EXPECTED AND CORRECT
> ### RUN THE PYSPARK CODE IN SESSION 2, "subprocess.call(["hadoop", "fs", 
> "-touchz", "foo.tmp"])"
> -bash-4.1$ curl localhost:8998/sessions/2/statements -X POST -H 
> 'Content-Type: application/json' -d '{"code":"subprocess.call([\"hadoop\", 
> \"fs\", \"-touchz\", \"foo.tmp\"])"}'
> {"id":1,"state":"running","output":null}
> ### GET THE OUTPUT OF THE CODE JUST RUN IN SESSION 2
> -bash-4.1$ curl localhost:8998/sessions/2/statements/1
> {"id":1,"state":"available","output":{"status":"ok","execution_count":1,"data":{"text/plain":"0"}}}
> ### THE OUTPUT IS {"text/plain":"0"} WHICH IS EXPECTED OUTPUT THAT THE TOUCHZ 
> COMPLETED WITH RETURN CODE 0.
> ### RUN THE PYSPARK CODE IN SESSION 2, 
> "print(subprocess.check_output(["hadoop", "fs", "-ls", "foo.tmp"]))"
> -bash-4.1$ curl localhost:8998/sessions/2/statements -X POST -H 
> 'Content-Type: application/json' -d 
> '{"code":"print(subprocess.check_output([\"hadoop\", \"fs\", \"-ls\", 
> \"foo.tmp\"]))"}'
> {"id":2,"state":"waiting","output":null}
> ### GET THE OUTPUT OF THE CODE JUST RUN IN SESSION 2
> -bash-4.1$ curl localhost:8998/sessions/2/statements/2
> {"id":2,"state":"available","output":{"status":"ok","execution_count":2,"data":{"text/plain":"-rw-------
>    3 username group          0 2017-02-23 19:26 foo.tmp"}}}
> ### THE OUTPUT IS {"text/plain":"-rw-------   3 username group          0 
> 2017-02-23 19:26 foo.tmp"} WHICH IS EXPECTED OUTPUT OF DIRECTORY LISTING
> ### RUN THE PYSPARK CODE IN SESSION 2, "subprocess.call(["hadoop", "fs", 
> "-rm", "foo.tmp"])"
> -bash-4.1$ curl localhost:8998/sessions/2/statements -X POST -H 
> 'Content-Type: application/json' -d '{"code":"subprocess.call([\"hadoop\", 
> \"fs\", \"-rm\", \"foo.tmp\"])"}'
> {"id":3,"state":"waiting","output":null}
> ### GET THE OUTPUT OF THE CODE JUST RUN IN SESSION 2
> -bash-4.1$ curl localhost:8998/sessions/2/statements/3
> {"id":3,"state":"available","output":{"status":"error","execution_count":3,"ename":"com.fasterxml.jackson.core.JsonParseException","evalue":"Unrecognized
>  token 'Moved': was expecting ('true', 'false' or 'null')\n at [Source: 
> Moved: 'foo.tmp' to trash at: .Trash/Current; line: 1, column: 
> 6]","traceback":[]}}
> ### JSON EXCEPTION APPEARS HERE WHICH IS INCORRECT PARSING OF THE OUTPUT
> ### RUN THE PYSPARK CODE IN SESSION 2, "print(1)"
> -bash-4.1$ curl localhost:8998/sessions/2/statements -X POST -H 
> 'Content-Type: application/json' -d '{"code":"print(1)"}'
> {"id":4,"state":"available","output":null}
> ### GET THE OUTPUT OF THE CODE JUST RUN IN SESSION 2
> -bash-4.1$ curl localhost:8998/sessions/2/statements/4
> {"id":4,"state":"available","output":{"status":"ok","execution_count":4,"data":{"text/plain":""}}}
> ### THE OUTPUT IS {"text/plain":""} WHICH IS EMPTY STRING, INDICATING 
> OPERATION COMPLETED WITH NO OUTPUT, WHICH IS INCORRECT, IT SHOULD RETURN 1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (LIVY-322) JsonParseException on failure to parse text output from subprocess call to hadoop fs -rm

Reply via email to