So this post may be a little rabbling, but I hope it starts the discussion.

Apache VXQuery by default returns the result to the CLI and prints it to
the screen. In practice, I usually pipe the output to a file for review. Do
you think we should add an option to save the result to a file (local or
hdfs)? I think this will become an issue/speed concern as we start running
VXQuery in a Yarn Cluster [1]. Currently the CLI must be running for the
whole query to receive the result. It would be nice to decouple these
processes. Although this creates two issues: how do you know when the query
is complete and how will we save the result.

Things to discuss:

alpha: Should we write the result to a file (local or hdfs)? Currently the
result is read and returned to the user through the CLI. The CLI could save
the result to a file instead. (sounds easy)

bravo: Can writing the result to a file be pushed into the Hyracks job? The
goal would be to allow the CLI to create and send the job while a separate
process read the result once finished. The client be able to disconnect
from the server while the job was running and connect back later to get the
result (no more need for the cli to be in a screen session).

charlie: What is the workflow we would like to see for running a query on a
Yarn VXQuery cluster? See diagram [1].


[1]
https://docs.google.com/drawings/d/13_kP4Yt1ze_pgqQcbVLmlBOxE6aX0Pmjg3FT2q4XX2k/edit?usp=sharing

Reply via email to