Jatin Sharma created SPARK-39722:
------------------------------------

             Summary: Make Dataset.showString() public
                 Key: SPARK-39722
                 URL: https://issues.apache.org/jira/browse/SPARK-39722
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.0, 2.4.8
            Reporter: Jatin Sharma


Currently, we have {{.show}} APIs on a Dataset, but they print directly to 
stdout.

But there are a lot of cases where we might need to get a String representation 
of the show output. For example
 * We have a logging framework to which we need to push the representation of a 
df
 * We have to send the string over a REST call from the driver
 * We want to send the string to stderr instead of stdout

For such cases, currently one needs to do a hack by changing the Console.out 
temporarily and catching the representation in a ByteArrayOutputStream or 
similar, then extracting the string from it.

Strictly only printing to stdout seems like a limiting choice. 

 

Solution:

We expose APIs to return the String representation back. We already have the 
.{{{}showString{}}} method internally.

 

We could mirror the current {{.show}} APIS with a corresponding {{.showString}} 
(and rename the internal private function to something else if required)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to