So this post may be a little rabbling, but I hope it starts the discussion.
Apache VXQuery by default returns the result to the CLI and prints it to the screen. In practice, I usually pipe the output to a file for review. Do you think we should add an option to save the result to a file (local or hdfs)? I think this will become an issue/speed concern as we start running VXQuery in a Yarn Cluster [1]. Currently the CLI must be running for the whole query to receive the result. It would be nice to decouple these processes. Although this creates two issues: how do you know when the query is complete and how will we save the result. Things to discuss: alpha: Should we write the result to a file (local or hdfs)? Currently the result is read and returned to the user through the CLI. The CLI could save the result to a file instead. (sounds easy) bravo: Can writing the result to a file be pushed into the Hyracks job? The goal would be to allow the CLI to create and send the job while a separate process read the result once finished. The client be able to disconnect from the server while the job was running and connect back later to get the result (no more need for the cli to be in a screen session). charlie: What is the workflow we would like to see for running a query on a Yarn VXQuery cluster? See diagram [1]. [1] https://docs.google.com/drawings/d/13_kP4Yt1ze_pgqQcbVLmlBOxE6aX0Pmjg3FT2q4XX2k/edit?usp=sharing
