mustafasrepo opened a new pull request, #6526:
URL: https://github.com/apache/arrow-datafusion/pull/6526
# Which issue does this PR close?
Closes #
# Rationale for this change
This PR adds the support for the following SQL queries:
```sql
CREATE EXTERNAL TABLE source_table (
a1 VARCHAR NOT NULL,
a2 INT NOT NULL
)
STORED AS CSV
WITH HEADER ROW
OPTIONS ('UNBOUNDED' 'TRUE')
LOCATION '{source}';
CREATE EXTERNAL TABLE sink_table (
a1 VARCHAR NOT NULL,
a2 INT NOT NULL
)
STORED AS CSV
WITH HEADER ROW
OPTIONS ('UNBOUNDED' 'TRUE')
LOCATION '{sink}';
INSERT INTO sink_table
SELECT a1, a2 FROM source_table;
```
This PR adds support for appending data to external tables, which previously
only supported memory tables. It introduces new structs and modifications to
existing structs, enabling users to efficiently work with file-based storage
systems when appending data.
# What changes are included in this PR?
- Added `FileSinkConfig` struct for base configurations when creating a
physical plan for any given file format.
- Added `FileWriterExt` and to handle writing record batches to a file-like
output.
- Added `CsvSink` struct for which implements `DataSink` to write results
to CSV file.
# Are these changes tested?
Yes
# Are there any user-facing changes?
This change allows users to append data to external tables, which was not
possible before. Users can now work with file-based storage systems more
efficiently, especially when appending data.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]