[jira] (SPARK-40286) Load Data from S3 deletes data source file

Drew (Jira) Wed, 31 Aug 2022 10:45:05 -0700


    [ https://issues.apache.org/jira/browse/SPARK-40286 ]



    Drew deleted comment on SPARK-40286:
    ------------------------------

was (Author: JIRAUSER295165):
In this case, before loading data into the table from my bucket in S3 has 
`kv1.txt`. Then, when I run the code block above, the file is removed from the 
s3 bucket directory. The data is in the table when I run `spark.sql('select * 
from src')`. I was wondering if that's exepected?

> Load Data from S3 deletes data source file
> ------------------------------------------
>
>                 Key: SPARK-40286
>                 URL: https://issues.apache.org/jira/browse/SPARK-40286
>             Project: Spark
>          Issue Type: Question
>          Components: Documentation
>    Affects Versions: 3.2.1
>            Reporter: Drew
>            Priority: Major
>
> Hello, 
> I'm using spark to [load 
> data|https://spark.apache.org/docs/latest/sql-ref-syntax-dml-load.html] into 
> a hive table through Pyspark, and when I load data from a path in Amazon S3, 
> the original file is getting wiped from the Directory. The file is found, and 
> is populating the table with data. I also tried to add the `Local` clause but 
> that throws an error when looking for the file. When looking through the 
> documentation it doesn't explicitly state that this is the intended behavior.
> Thanks in advance!
> {code:java}
> spark.sql("CREATE TABLE src (key INT, value STRING) STORED AS textfile")
> spark.sql("LOAD DATA INPATH 's3://bucket/kv1.txt' OVERWRITE INTO TABLE 
> src"){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-40286) Load Data from S3 deletes data source file

Reply via email to