jimjam-slam commented on issue #45901:
URL: https://github.com/apache/arrow/issues/45901#issuecomment-2785340576
@amoeba Apologies for the slow reply! For R users, `{readr}` supports ragged
CSVs — it throws a warning for the short rows but still fills them with a
type-specific `NA`:
```r
csv_string <- "name,group,score
North,A,5
East,A
West,B,7
South
"
df <- readr::read_csv(csv_string)
#
# Rows: 4 Columns: 3
# ── Column specification
──────────────────────────────────────────────────────────────
# Delimiter: ","
# chr (2): name, group
# dbl (1): score
#
# ℹ Use `spec()` to retrieve the full column specification for this data.
# ℹ Specify the column types or set `show_col_types = FALSE` to quiet this
message.
# Warning message:
# One or more parsing issues, call `problems()` on your data frame for
details, e.g.:
# dat <- vroom(...)
# problems(dat)
df
# # A tibble: 4 × 3
# name group score
# <chr> <chr> <dbl>
# 1 North A 5
# 2 East A NA
# 3 West B 7
# 4 South NA NA
readr::problems(df)
# # A tibble: 2 × 5
# row col expected actual file
# <int> <int> <chr> <chr> <chr>
# 1 3 2 3 columns 2 columns
/private/var/folders/v3/ktxzq5ks2cz4xbvn975sp…
# 2 5 1 3 columns 1 columns
/private/var/folders/v3/ktxzq5ks2cz4xbvn975sp…
```
The `{readr}` package also has `melt_csv()` that is specifically designed
for ragged data (the function has been superseded and moved into the `{meltr}`
package but currently still remains in `{readr}`):
```r
df2 <- readr::melt_csv(csv_string)
# Warning message:
# `melt_csv()` was deprecated in readr 2.0.0.
# ℹ Please use `meltr::melt_csv()` instead
# This warning is displayed once every 8 hours.
# Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
generated.
df2
# # A tibble: 12 × 4
# row col data_type value
# <dbl> <dbl> <chr> <chr>
# 1 1 1 character name
# 2 1 2 character group
# 3 1 3 character score
# 4 2 1 character North
# 5 2 2 character A
# 6 2 3 integer 5
# 7 3 1 character East
# 8 3 2 character A
# 9 4 1 character West
# 10 4 2 character B
# 11 4 3 integer 7
# 12 5 1 character South
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]