caldempsey commented on issue #448: URL: https://github.com/apache/arrow-go/issues/448#issuecomment-3111522827
@zeroshade Hey I appreciate the fast responses. The `TableFromJSON` function actually returns one row out of 10K in my test when passed to a DataFrame, and I've updated **_that_** GitHub issue with a complete standalone test you can use to repro the DataFrame problem, ready for you to plug a Spark Connect URL in if you do fancy testing it. This is with: ```go table, err := array.TableFromJSON(memory.DefaultAllocator, schema, jsonData, // one payload per line array.WithMultipleDocs(), // without this : failed to create Arrow table from JSON: json doc must be an array, found { ) ``` I _then_ looked at the other method, saw the same issue, and eventually worked out that actually whatever you set `WithChunks()` to, is always equal to the number of rows that get parsed, unless its `-1`. So I trust this reproduces that problem: ```go array.NewJSONReader(bytes.NewReader(ndjsonData), schema, array.WithAllocator(pool), array.WithChunk(100)) ``` Apologies, the initial helper functions I provided were a bit... Scrappy. The `[]string{}` solution didn't work for me due to the performance issue above with real data (all my operations timed out, so unusable). I then tried to pass a record to each member of the slice, which yields only the first chunk. Because of not being able to use `[]string{}` in my production level use cases, that's how I went down this performance rabbit hole, and eventually got here! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org