[ https://issues.apache.org/jira/browse/ARROW-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583184#comment-17583184 ]
Adam Black edited comment on ARROW-17373 at 8/22/22 8:28 PM: ------------------------------------------------------------- Here is another reprex. I think this only happens when the write location is the same as the current location of the open dataset. In the example above `writePath` and `savePath` are equal. {{df <- data.frame(x = replicate(1,sample(0:1, 100e6, rep=TRUE)))}} {{savePath <- file.path(tempdir(), 'arrowTest')}} {{{}if (!dir.exists(savePath)){}}}{ dir.create(savePath) } {{arrow::write_feather(df, file.path(savePath, 'part-0.feather'))}} {{# writing a dataset to a new directory ('arrowTest2') works fine}} {{writePath <- file.path(tempdir(), 'arrowTest2')}} if (!dir.exists(writePath)) \{ dir.create(writePath) } {{dataset <- arrow::open_dataset(savePath, format='feather')}} {{nrow(dataset)}} {{#> [1] 100000000}} {{arrow::write_dataset(dataset = dataset, path = writePath, format = 'feather')}} {{dataset2 <- arrow::open_dataset(writePath, format='feather')}} {{nrow(dataset2)}} {{#> [1] 100000000}} {{# trying to write an open dataset to it's own path gives an error}} {{arrow::write_dataset(dataset = dataset2, path = writePath, format = 'feather')}} {{{}#> Error: Invalid: Expected to read 144 metadata bytes but got 0{}}}{{{}# and it modifies the dataset{}}} {{nrow(dataset2)}} {{#> [1] 1966080}} {{# But with a smaller dataset there seems to be no issue }} {{arrow::write_dataset(dataset = head(dataset, 1000), path = savePath, format = 'feather')}} {{dataset3 <- arrow::open_dataset(savePath, format='feather')}} {{nrow(dataset3)}} {{#> [1] 1000}} was (Author: JIRAUSER289460): Here is another reprex. I think this only happens when the write location is the same as the current location of the open dataset. In the example above `writePath` and `savePath` are equal. {{df <- data.frame(x = replicate(1,sample(0:1, 100e6, rep=TRUE)))}} {{savePath <- file.path(tempdir(), 'arrowTest')}} {{{}if (!dir.exists(savePath)){}}}{ dir.create(savePath) } {{arrow::write_feather(df, file.path(savePath, 'part-0.feather'))}} {{# writing a dataset to a new directory ('arrowTest2') works fine}} {{writePath <- file.path(tempdir(), 'arrowTest2')}} if (!dir.exists(writePath)) \{ dir.create(writePath) } {{dataset <- arrow::open_dataset(savePath, format='feather')}} {{nrow(dataset)}} {{#> [1] 100000000}} {{arrow::write_dataset(dataset = dataset, path = writePath, format = 'feather')}} {{dataset2 <- arrow::open_dataset(writePath, format='feather')}} {{nrow(dataset2)}} {{#> [1] 100000000}} {{trying to write an open dataset to it's own path gives an error}} {{arrow::write_dataset(dataset = dataset2, path = writePath, format = 'feather')}} {{{}#> Error: Invalid: Expected to read 144 metadata bytes but got 0{}}}{{{}# and it modifies the dataset{}}} {{nrow(dataset2)}} {{#> [1] 1966080}} {{# But with a smaller dataset there seems to be no issue }} {{arrow::write_dataset(dataset = head(dataset, 1000), path = savePath, format = 'feather')}} {{dataset3 <- arrow::open_dataset(savePath, format='feather')}} {{nrow(dataset3)}} {{#> [1] 1000}} > [R] copying dataset and immediatly writing the copy to a different location > fails > --------------------------------------------------------------------------------- > > Key: ARROW-17373 > URL: https://issues.apache.org/jira/browse/ARROW-17373 > Project: Apache Arrow > Issue Type: Bug > Components: R > Affects Versions: 9.0.0 > Environment: Ubuntu 22.04 > Reporter: Egill Axfjord Fridgeirsson > Priority: Major > > When I copy large feather files, open a dataset from that file and > immediately write that dataset to a new location I get the following error: > > ```Error: Invalid: Expected to read 144 metadata bytes but got 0``` > > I have made a reproducible example below: > > ``` r > df <- data.frame(replicate(1,sample(0:1,100e6,rep=TRUE))) > savePath <- file.path(tempdir(), 'arrowTest') > if (!dir.exists(savePath)) { > dir.create(savePath) > } > arrow::write_feather(df, file.path(savePath, 'part-0.feather')) > copyPath <- file.path(tempdir(),'arrowTest') > if (!dir.exists(copyPath)) { > dir.create(copyPath) > } > writePath <- file.path(tempdir(), 'arrowTest') > if (!dir.exists(writePath)) { > dir.create(writePath) > } > arrow::copy_files(savePath, copyPath) > dataset <- arrow::open_dataset(copyPath, format='feather') > arrow::write_dataset(dataset = dataset, path = writePath, format = 'feather') > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)