[
https://issues.apache.org/jira/browse/ARROW-12124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310716#comment-17310716
]
Neville Dipale commented on ARROW-12124:
----------------------------------------
[~domoritz] I've commented on https://github.com/domoritz/csv2parquet/issues/2
with the solution to the issue
> [Rust] Parquet writer creates invalid parquet files
> ---------------------------------------------------
>
> Key: ARROW-12124
> URL: https://issues.apache.org/jira/browse/ARROW-12124
> Project: Apache Arrow
> Issue Type: Bug
> Components: Rust
> Reporter: Dominik Moritz
> Priority: Major
>
> I wrote a simple CSV to Parquet converter at
> https://github.com/domoritz/csv2parquet/blob/f53feb5bd995eab41dee09f2c4d722512052d7ca/src/main.rs.
>
> Running it (`csv2parquet test.txt test.parquet`) with a simple file such as
> ```
> a,b,c
> 0,1,hello world
> 0,1,hello world
> 0,1,hello world
> 0,1,hello world
> 0,1,hello world
> 0,1,hello world
> 0,1,hello world
> ```
> And then trying to read in Python with
> ```
> import pandas as pd
> df = pd.read_parquet('test.parquet')
> df.to_csv('test2.csv')
> ```
> Results in this error
> ```
> OSError: Could not open parquet input source '<Buffer>': Invalid: Parquet
> magic bytes not found in footer. Either the file is corrupted or this is not
> a parquet file.
> ```
> The schema seems to be inferred correctly
> ```
> Inferred Schema:
> {
> "fields": [
> {
> "name": "a",
> "nullable": false,
> "type": {
> "name": "int",
> "bitWidth": 64,
> "isSigned": true
> },
> "children": []
> },
> {
> "name": "b",
> "nullable": false,
> "type": {
> "name": "int",
> "bitWidth": 64,
> "isSigned": true
> },
> "children": []
> },
> {
> "name": "c",
> "nullable": false,
> "type": {
> "name": "utf8"
> },
> "children": []
> }
> ],
> "metadata": {}
> }
> ```
--
This message was sent by Atlassian Jira
(v8.3.4#803005)