alamb opened a new pull request #9840:
URL: https://github.com/apache/arrow/pull/9840


   Note this builds on the code in #9818 so putting up as a draft until that PR 
is merged
   
   # Rationale
   
   Provide schema metadata access (so a user can see what columns exist and 
their type).
   
   See the doc for background: 
https://docs.google.com/document/d/12cpZUSNPqVH9Z0BBx6O8REu7TFqL-NPPAYCUPpDls1k/edit#
   
   I plan to add support for `SHOW COLUMNS` possibly as a follow on PR (though 
I have found out that `SHOW COLUMNS` and `SHOW TABLES` are not supported by 
either MySQL or by Postgres :thinking_face:)
   
   # Changes
   I chose to add the firt 15 columns from `information_schema.columns` You can 
see the full list in Postgres 
[here](https://www.postgresql.org/docs/9.5/infoschema-columns.html) and SQL 
Server 
[here](https://docs.microsoft.com/en-us/sql/relational-databases/system-information-schema-views/columns-transact-sql?view=sql-server-ver15).
 
   
   There are a bunch more columns that say "Applies to features not available 
in PostgreSQL" and that don't apply to DataFusion either-- since my usecase is 
to get the basic schema information out I chose not to add a bunch of columns 
that are always null at this time.
   
   I feel the use of column builders here is somewhat awkward (as it requires 
many calls to `unwrap`). I am thinking of a follow on PR to refactor this code 
to use `Vec<String>` and `Vec<u64>` and then create `StringArray` and 
`UInt64Array` directly from them but for now I just want the functionality
   
   
   # Example use
   
   Setup:
   ```
   echo "1,Foo,44.9" > /tmp/table.csv
   echo "2,Bar,22.1" >> /tmp/table.csv
   cargo run --bin datafusion-cli
   ```
   
   
   Then run :
   
   ```
   > CREATE EXTERNAL TABLE t(a int, b varchar, c float)
   STORED AS CSV
   LOCATION '/tmp/table.csv';
   0 rows in set. Query took 0 seconds.
   
   >   select table_name, column_name, ordinal_position, is_nullable, data_type 
from information_schema.columns;
   +------------+-------------+------------------+-------------+-----------+
   | table_name | column_name | ordinal_position | is_nullable | data_type |
   +------------+-------------+------------------+-------------+-----------+
   | t          | a           | 0                | NO          | Int32     |
   | t          | b           | 1                | NO          | Utf8      |
   | t          | c           | 2                | NO          | Float32   |
   +------------+-------------+------------------+-------------+-----------+
   3 row in set. Query took 0 seconds.
   ```
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to