houqp commented on a change in pull request #1928:
URL: https://github.com/apache/arrow-datafusion/pull/1928#discussion_r820143271



##########
File path: benchmarks/db-benchmark/db-benchmark.dockerfile
##########
@@ -0,0 +1,54 @@
+FROM ubuntu
+ARG DEBIAN_FRONTEND=noninteractive
+
+RUN apt-get update && \
+    apt-get install -y git build-essential
+
+# Install R, curl, and python deps
+RUN apt-get update && apt-get -y install --no-install-recommends 
--no-install-suggests \
+    ca-certificates software-properties-common gnupg2 gnupg1 \
+    && apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 
E298A3A825C0D65DFD57CBB651716619E084DAB9 \
+    && add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu 
bionic-cran35/' \
+    && apt-get -y install r-base \
+    && apt-get -y install curl \
+    && apt-get -y install python3.8 \
+    && apt-get -y install python3-pip
+
+# Install R libraries
+RUN R -e "install.packages('data.table',dependencies=TRUE, 
repos='http://cran.rstudio.com/')" \
+    && R -e "install.packages('dplyr',dependencies=TRUE, 
repos='http://cran.rstudio.com/')"
+
+# Install Rust
+RUN curl https://sh.rustup.rs -sSf | bash -s -- -y
+ENV PATH="/root/.cargo/bin:${PATH}"
+
+# Clone db-benchmark and download data
+RUN git clone https://github.com/h2oai/db-benchmark \
+    && Rscript db-benchmark/_data/groupby-datagen.R 1e7 1e2 0 0 \
+    && Rscript db-benchmark/_data/join-datagen.R 1e7 0 0 0
+
+# Copy local arrow-datafusion
+COPY . arrow-datafusion
+
+# Clone datafusion-python and build python library
+# Not sure if the wheel will be the same on all computers
+RUN git clone https://github.com/datafusion-contrib/datafusion-python \

Review comment:
       would be good to clone a particular tag/commit to make this more 
reproducible.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to