[Hadoop Wiki] Update of "Hbase/Cascading" by Misty

Apache Wiki Sun, 01 Nov 2015 20:58:35 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Hbase/Cascading" page has been changed by Misty:
https://wiki.apache.org/hadoop/Hbase/Cascading?action=diff&rev1=4&rev2=5

- [[http://www.cascading.org/|Cascading]] is an alternative API to Hadoop 
MapReduce. Under the covers it uses MapReduce during execution, but during 
development, users don't have to think in MapReduce to create solutions for 
execution on Hadoop.
+ The HBase Wiki is in the process of being decommissioned. The info that used 
to be on this page has moved to https://hbase.apache.org/book.html#cascading. 
Please update your bookmarks.
  
- Cascading now has support for reading and writing data to and from a HBase 
cluster.
- 
- Detailed information and access to the source code can be found on the 
[[http://www.cascading.org/modules.html|Cascading Modules]] page.  
[[http://code.google.com/p/cascading/downloads/list|Cascading 1.0.1]] is 
required.
- 
- Here is a simple example showing how to "sink" data into an HBase cluster. 
Note the exact same "hBaseTap" instance can be used to "source" data as well 
(as shown in the unit tests). See the github repo, linked from the modules 
page, for more up-to-date API.
- 
- {{{#!java
- // read data from the default filesystem
- // emits two fields: "offset" and "line"
- Tap source = new Hfs( new TextLine(), inputFileLhs );
- 
- // store data in a HBase cluster
- // accepts fields "num", "lower", and "upper"
- // will automatically scope incoming fields to their proper familyname, 
"left" or "right"
- Fields keyFields = new Fields( "num" );
- String[] familyNames = {"left", "right"};
- Fields[] valueFields = new Fields[] {new Fields( "lower" ), new Fields( 
"upper" ) };
- Tap hBaseTap = new HBaseTap( "multitable", new HBaseScheme( keyFields, 
familyNames, valueFields ), SinkMode.REPLACE );
- 
- // a simple pipe assembly to parse the input into fields
- // a real app would likely chain multiple Pipes together for more complex 
processing
- Pipe parsePipe = new Each( "insert", new Fields( "line" ), new RegexSplitter( 
new Fields( "num", "lower", "upper" ), " " ) );
- 
- // "plan" a cluster executable Flow
- // this connects the source Tap and hBaseTap (the sink Tap) to the parsePipe
- Flow parseFlow = new FlowConnector( properties ).connect( source, hBaseTap, 
parsePipe );
- 
- // start the flow, and block until complete
- parseFlow.complete();
- 
- // open an iterator on the HBase table we stuffed data into
- TupleEntryIterator iterator = parseFlow.openSink();
- 
- while(iterator.hasNext())
-   {
-   // print out each tuple from HBase
-   System.out.println( "iterator.next() = " + iterator.next() );
-   }
- 
- iterator.close();
- }}}
- 
- Note the "hBaseTap" above can be used as both a sink and a source in a Flow. 
So another Flow could be created to process data stored in HBase.
-

[Hadoop Wiki] Update of "Hbase/Cascading" by Misty

Reply via email to