Thanks for reply Jay,

Yep, I've started on BIGTOP-1019. Now am trying to figure out how does
smoke test suite run in distributed mode. One thought - keeping things
simple, for the sake of running smokes on distributed cluster, isn't is
simpler to just scp this file around into folders like
"/tmp/bigtop-smokes-run/sqoop/"?

I don't have cluster handy, so probably once I have some patch put
together, I'll start testing it locally on VM and attach to JIRA.

Also, based on the discussion in jira, original proposal was to use H2
database, then the focus shifted to mock JDBC data source, and now it's
HSQL, or am I not following the topic correctly?




2014/1/12 Jay Vyas <[email protected]>

> Hi mikhail !
>
> ---- Regarding the need for a cluster, IMO its imperative to run in some
> kind of  cluster, The reason is that:
>
> 1) You want your test to transfer a file  into a RDBMS, and you want it to
> test that the distributed cluster is actually breaking the work up
> properly.
>
> 2) Also we would want to confirm that the cluster is able to distribute
> exporting (copying from the distributed file system into a rdbms) as well.
>
> TL;DR : Definetly validate your sqoop smokes in a distributed cluster for
> sqoop.    It can be  couple of VMs ... nothing fancy, but more than one
> machine.  For other tests (like wordcount, for example) I guess testing on
> a single node  is probably a little more acceptable.
>
> ** If you dont have a cluster just point me at a patch ! we can test it for
> you .. **
>
> --- Now regarding sqoop tests:
>
> By the way ... Are you working on BIGTOP-1019?  If so thanks !  One
> possible implementation is using HSQL in file mode, and then putting the
> file as a locally readable file on all machines.  I havent figured out a
> way to do that in hadoop with hdfs yet, but in gluster, we typically FUSE
> mount, so I have a very simple smoke test that works nicely for sqoop  on
> gluster... you might be interested to check it out if you can find a way to
> make a single file locally available to all nodes on a hadoop cluster, it
> will work (and be easy to maintain : no network connection required for the
> JDBC stuff - just pure JDBC ETL)...  Its shell scripted here (but we could
> port the shell commands easily to Itest commands in groovy)...
>
> https://github.com/jayunit100/bigtop/blob/master/bigtop-tests/test-artifacts/sqoop/src/main/resources/test2.sh
> .
>
> - You can see that it runs hsql in "file" mode, and stores the database as
> a file in a gluster mounted directory.
>
> -That directory is available to every node of my cluster ..
>
> - So that is actually a very concrete example of how, if i didnt run the
> test in distributed mode, and there was a bug in my code
>
>
>
>
>
> On Sun, Jan 12, 2014 at 8:37 PM, Mikhail Antonov <[email protected]
> >wrote:
>
> > Hello everyone,
> >
> > I have one question on the "proper" approach to test development. While
> > working on (for example) tests for Sqoop (migrating them from mysql to
> > mocked jdbc driver), what's the "right" approach for the iteration?
> >
> > Say I have host machine with Fedora, where I have build environment and
> > build Bigtop. Shall I have 1 manually-created VM with minimal CentOS,
> where
> > I copy built rpms, install them, and run smokes (or generate VM in
> > automated way)? Or shell I have cluster of VMs to run fullly-distributed
> > cluster? Or is it more or less fine to just run smokes on host machine?
> >
> > --
> > Thanks,
> > Mikhail Antonov
> >
>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>



-- 
Thanks,
Michael Antonov

Reply via email to