jonkeane commented on PR #45346: URL: https://github.com/apache/arrow/pull/45346#issuecomment-2614454649
Thanks for all the info @TysonStanley and @ben-schwen > "Safe option" would be not to drop the data.table class but only drop the `.internal.selfref` attribute, so data.table will detect that and allocate the right memory. ... I consider the initial report as WAI, although our warning could be better since we also warn about the serializing case when `saveRDS` and subsequent `readRDS`. _nods_ yeah, this sounds good to me. I'm happy to help with a PR for improving that warning in `data.table`. It makes a lot of sense for it to be there since it's about serializing those objects in any fashion, not just with Arrow/Parquet, but I wonder if there is anything that we should add to our documentation that calls this out as well. We could add something to https://arrow.apache.org/docs/r/articles/metadata.html#r-object-attributes which calls this case out as one of the cases where we can't preserve a portion of metadata, but the way that `data.table` works that's working as expected and self heals, etc. > We can silence the warning if we do what @nealrichardson suggested in https://github.com/apache/arrow/pull/45346#issuecomment-2613008898 though I'm not sure whether we want to or not. I pushed up https://github.com/apache/arrow/commit/459bde8b4bac76d31e70b8fcf45088ef1d99ea8b for discussion. Thanks for pushing this up for discussion (and extra thanks for jumping on this overall), IMHO we probably don't want to do this. If I am understanding correctly, there will still be a (small) performance hit when the memory is allocated and we should defer to `data.table`'s self healing here since it knows best when and how (and can message appropriately that there might be a small performance degradation that is coming from the fact that any serialization happened). It sounds like we will likely close this PR, but I can go back to the issue and follow up with a TL;DR there explaining what we learned here (and closing that too since it's working as intended). I'm a little surprised in the original issue they mention this (seeming to) work in older versions of Arrow. There's no way it _actually did_ serialize the pointer and bring it back faithfully, but maybe something in #41969 changed the serialization slightly such that `data.table` is now better able to recognize and warn about the need to self-heal? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
