On Thu, Sep 10, 2015 at 7:35 PM, Preston Carman <[email protected]> wrote: > So this post may be a little rabbling, but I hope it starts the discussion. > > Apache VXQuery by default returns the result to the CLI and prints it to > the screen. In practice, I usually pipe the output to a file for review. Do > you think we should add an option to save the result to a file (local or > hdfs)? I think this will become an issue/speed concern as we start running > VXQuery in a Yarn Cluster [1]. Currently the CLI must be running for the > whole query to receive the result. It would be nice to decouple these > processes. Although this creates two issues: how do you know when the query > is complete and how will we save the result. > > Things to discuss: > > alpha: Should we write the result to a file (local or hdfs)? Currently the > result is read and returned to the user through the CLI. The CLI could save > the result to a file instead. (sounds easy)
Based on my experiences with very large result sets, I strongly recommend the following approach: - First, and foremost, have an API internally, which specifies a kind of event listener, and have output always written to that event listener. (In the case of XML, the event listener would be a SAX ContentHandler.) - The default event listener would simply serialize the output events into a stream, thereby implementing the functionality to write to standard output, or a file. (In the case of XML, the default event listener would be a Transformer with a StreamResult. - Alternative, and custom event listeners could (for example) filter, and count events, discarding all data. Jochen -- The next time you hear: "Don't reinvent the wheel!" http://www.keystonedevelopment.co.uk/wp-content/uploads/2014/10/evolution-of-the-wheel-300x85.jpg
