[
https://issues.apache.org/jira/browse/ARROW-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291844#comment-17291844
]
Neal Richardson commented on ARROW-11735:
-----------------------------------------
See r/data-raw/codegen.R for how this comes together (docs in comments at the
top of the file), and look at the bottom of r/src/filesystem.cpp for an example
of how this is working for s3. Changes to make:
* Make sure that everything that touches things in the parquet C++ namespace
(C++ classes that are {{parquet::something}}, header {{#include}} statements of
parquet paths) are inside {{#if defined(ARROW_R_WITH_PARQUET)}}
* Change the {{// [[arrow::export]]}} annotations on those functions to be {{//
[[parquet::export]]}}
* Add "parquet" to the features list at the top of data-raw/codegen.R
* This is the tricky part: in configure and configure.win, do a check similar
to how we look for whether {{ARROW_S3}} was enabled in the C++ build, and if so
add {{-DARROW_R_WITH_PARQUET}} to PKG_CFLAGS. This will be trickier on Windows
because I don't think the rwinlib bundle includes ArrowOptions.cmake (but it
will always have Parquet enabled). An alternative (perhaps more robust
approach) would be to follow the TEST_CMD approach and test for parquet/s3
headers (current L179 of configure).
* In configure.win, you may also need to massage the PKG_LIBS since it has
hard-coded that parquet and thrift are present (as they are in the rwinlib
bundles). This is perhaps a separate issue from this (since this issue is
particularly about Solaris and that's about doing different C++ dev builds on
Windows.
* Once you've done this, the build should succeed without Parquet, but tests
that involve Parquet will error. Revise skip_if_not_available() in
r/tests/testthat/helper-skip.R to check {{arrow_with_parquet()}} if feature ==
"parquet" (this function is generated by codegen.R), then add
{{skip_if_not_available("parquet")}} to all relevant tests.
* Search for {{arrow_with_s3()}} in the code and see where else we should do
the same with arrow_with_parquet(). Among the uses:
* Add parquet to arrow_info() alongside where arrow_with_s3() is checked
* Wrap any parquet doc examples {{if (arrow_with_parquet())}}
> [R] Allow parquet to be an optional component like S3
> -----------------------------------------------------
>
> Key: ARROW-11735
> URL: https://issues.apache.org/jira/browse/ARROW-11735
> Project: Apache Arrow
> Issue Type: Sub-task
> Components: R
> Reporter: Neal Richardson
> Priority: Major
> Fix For: 4.0.0
>
>
> Parquet requires thrift and it seems that thrift (at least as of version
> 0.12) does not compile on Solaris:
> {code}
> /export/home/X1svPYR/Rtemp/RtmptF1MlN/file75097d284891/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/THttpServer.cpp:
> In member function virtual void
> apache::thrift::transport::THttpServer::parseHeader(char*):
> /export/home/X1svPYR/Rtemp/RtmptF1MlN/file75097d284891/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/THttpServer.cpp:50:74:
> error: strcasestr was not declared in this scope
> #define THRIFT_strcasestr(haystack, needle) strcasestr(haystack, needle)
> ^
> /export/home/X1svPYR/Rtemp/RtmptF1MlN/file75097d284891/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/THttpServer.cpp:62:9:
> note: in expansion of macro THRIFT_strcasestr
> if (THRIFT_strcasestr(value, "chunked") != NULL) {
> {code}
> (along with some boost endian header deprecation warnings)
> We could debug/patch that, or we could also make Parquet an optional feature
> in the R bindings. That might have some value anyway so that one could build
> a lighter/minimal R package, if that were helpful.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)