On Thu, Sep 10, 2015 at 7:35 PM, Preston Carman <[email protected]> wrote:
> So this post may be a little rabbling, but I hope it starts the discussion.
>
> Apache VXQuery by default returns the result to the CLI and prints it to
> the screen. In practice, I usually pipe the output to a file for review. Do
> you think we should add an option to save the result to a file (local or
> hdfs)? I think this will become an issue/speed concern as we start running
> VXQuery in a Yarn Cluster [1]. Currently the CLI must be running for the
> whole query to receive the result. It would be nice to decouple these
> processes. Although this creates two issues: how do you know when the query
> is complete and how will we save the result.
>
> Things to discuss:
>
> alpha: Should we write the result to a file (local or hdfs)? Currently the
> result is read and returned to the user through the CLI. The CLI could save
> the result to a file instead. (sounds easy)

Based on my experiences with very large result sets, I strongly
recommend the following approach:

- First, and foremost, have an API internally, which specifies a kind
of event listener, and have output always written
  to that event listener. (In the case of XML, the event listener
would be a SAX ContentHandler.)
- The default event listener would simply serialize the output events
into a stream, thereby implementing the
  functionality to write to standard output, or a file. (In the case
of XML, the default event listener would be a
  Transformer with a StreamResult.
- Alternative, and custom event listeners could (for example) filter,
and count events, discarding all data.

Jochen



-- 
The next time you hear: "Don't reinvent the wheel!"

http://www.keystonedevelopment.co.uk/wp-content/uploads/2014/10/evolution-of-the-wheel-300x85.jpg

Reply via email to