Congrats, the 1.0.0-incubating release was picked up by Hadoop Weekly :) ---------- Forwarded message ---------- From: "Hadoop Weekly" <[email protected]> Date: Oct 23, 2016 19:21 Subject: Hadoop Weekly #191 To: <[email protected]> Cc:
Hadoop Weekly > Issue #191 > 23 October 2016 > > This week's issue is short and sweet with a few technical posts, two > interesting news articles, and several exciting releases (including Apache > Kafka 0.10.1.0). With Spark Summit Europe this week, expect lots of great > content in the next issue. And if you're attending, please send interesting > slides/talks my way! > > Technical > ======= > > Cloudera's CDH supports intra-node disk balancing since version 5.8.2 > (it's also part of the 3.0.0 alpha Apache release). Using this feature, a > data node can rebalance data blocks across disks using the `hdfs > diskbalancer` command. This post describes how the tool works and shows how > to run it. > > http://blog.cloudera.com/blog/2016/10/how-to-use-the-new- > hdfs-intra-datanode-disk-balancer-in-apache-hadoop/ > > > This post demonstrates the capabilities of the spark.ml library by > building a logistic regression model to predict malignancy of cases from > the Wisconsin Diagnostic Breast Cancer data set. The example code covers > parsing, exploring a dataset with built-in statistics, extracting features > from the input dataset, training the model, and evaluating the model. > > https://www.mapr.com/blog/predicting-breast-cancer- > using-apache-spark-machine-learning-logistic-regression > > > The Amazon Big Data blog has a tutorial for running RStudio with sparklyr > on EMR. Thanks to a bootstrap action, a cluster complete with RStudio > running on the master, can be launched with a single command. > > https://aws.amazon.com/blogs/big-data/running-sparklyr- > rstudios-r-interface-to-spark-on-amazon-emr/ > > > The Databricks blog features a list of seven tips for debugging Apache > Spark code on Databricks. Most of the suggestions, like "Scale up Spark > jobs slowly for really large datasets" and "Examine the partitioning for > your dataset," are generally applicable to all Spark users. > > https://databricks.com/blog/2016/10/18/7-tips-to-debug- > apache-spark-code-faster-with-databricks.html > > > News > ==== > > InfoQ has an interview with Yahoo VP of Engineering, Peter Cnudde. Topics > covered include Hadoop, Spark adoption at Yahoo (mostly for in-memory > computing, not for ETL), and Caffe-on-Spark for deep learning. > > https://www.infoq.com/articles/peter-cnudde-yahoo-big-data > > > ZDNet contributor Tony Baer has read between the lines when it comes to > recent benchmarks by Cloudera and Hortonworks. The takeaways are as > follows: 1) "SQL's the gateway drug to Hadoop." 2) Cloudera is trying to > challenge Amazon (in this case Redshift), and 3) Hortonworks (via Hive's > Live Long and Prosper) has caught up on the investment Cloudera made in > Impala. > > http://www.zdnet.com/article/sql-on-hadoop-benchmarks-get-serious/ > > > Releases > ======= > > Apache Kafka 0.10.1.0 was released this week. It contains improvements > from over 500 pull requests and the implementation of 15 Kafka Improvement > Proposals. The Confluent blog has the highlights of additions/improvements > to Kafka Server (time-based indexes, replication quotas, and improved log > compaction), improvements to Kafka client APIs (interactive queries for > Kafak Streams, improved memory management, secure quotas, and more), and > bug fixes. > > http://mail-archives.apache.org/mod_mbox/kafka-users/ > 201610.mbox/%3CCAJL4t_oz9q4T9vn6Z-EBoazWJFyqHw4Y0L- > PTowD%2BpFhcPv0VQ%40mail.gmail.com%3E > http://www.confluent.io/blog/announcing-apache-kafka-0-10-1-0/ > > Apache Fluo (incubating), recently had its first release since entering > the incubator. Fluo is a tool for making "incremental updates to large data > sets stored in Apache Accumulo" a la Google's Perculator. > > https://fluo.apache.org/release/fluo-1.0.0-incubating/ > > > Apache Flume 1.7.0 was released. It adds support for a `taildir` source > and includes a number of improvements and bug fixes. Many of these are > around Flume's integration with Apache Kafka. > > http://flume.apache.org/releases/1.7.0.html > > > Apache NiFi 0.7.1 was released as a follow-up to July's 0.7.0 release > (version 1.0.0 was also recently released—in August). This release adds a > number of improvements and bug fixes. > > https://cwiki.apache.org/confluence/display/NIFI/ > Release+Notes#ReleaseNotes-Version0.7.1 > > > Apache Giraph 1.2.0 was released. Highlight's of the release include a new > blocks API, support for graphs that don't fit in memory, and the addition > of a new set of default configuration options based on Facebook's > experience with Giraph. > > https://blogs.apache.org/giraph/entry/giraph_1_2_0_release > > > `deeplearning4j` is a deep learning implementation that integrates with > Hadoop and Spark and supports GPUs. Version 0.6.0 was recently released. > > https://github.com/deeplearning4j/deeplearning4j > > > Events > ===== > Curated by Datadog ( http://www.datadog.com ) > UNITED STATES > > California > Uber Engineering Tech Talk Series (San Francisco) - Monday, October 24 > http://www.meetup.com/UberEvents/events/234789134/ > > Real-Time Streaming and Exactly-Once Semantics with Kafka (San Francisco) > - Tuesday, October 25 > http://www.meetup.com/MemSQL/events/234405914/ > > Building Your First Spark & C* App + SMACK Stack + The Cassandra Odyssey > (San Francisco) - Wednesday, October 26 > http://www.meetup.com/SF-Spark-and-Friends/events/234932979/ > > Apache YARN Committers/ContributÂors Meetup #4 (Sunnyvale) - Thursday, > October 27 > http://www.meetup.com/Hadoop-Contributors/events/234971372/ > > > Washington > Kafka Palooza: LinkedIn, Microsoft Azure, MapR (Bellevue) - Monday, > October 24 > http://www.meetup.com/Seattle-Apache-Kafka-Meetup/events/234836624/ > > > Nevada > PixieDust: Making Python Visualizations Easier for Jupyter Notebooks with > Spark (Las Vegas) - Monday, October 24 > http://www.meetup.com/Data-Science-Las-Vegas/events/234557659/ > > > Texas > O&G Big Data Use Cases, by Hortonworks (Houston) - Thursday, October 27 > http://www.meetup.com/Houston-Hadoop-Meetup-Group/events/234282996/ > > > Kansas > Using Data Quality to Support Analytics in Hadoop (Overland Park) - > Tuesday, October 25 > http://www.meetup.com/Kansas-City-Big-Data-Projects-Group/ > events/234597551/ > > > Missouri > Using Data Quality to Support Analytics in Hadoop (Kansas City) - Tuesday, > October 25 > http://www.meetup.com/Kansas-City-Big-Data-Projects-Group/ > events/234597347/ > > > Illinois > Big Data Streaming Platform Ecosystem (Chicago) - Tuesday, October 25 > http://www.meetup.com/ChicagoRealTimeStreamingAnalytics/events/234676872/ > > Apache Spark 101 (Chicago) - Tuesday, October 25 > http://www.meetup.com/Chicago-Spark-Users/events/233999667/ > > > Ohio > October Edition of MOHUG (Dublin) - Tuesday, October 25 > http://www.meetup.com/MOHUG-Mid-Ohio-Hadoop-User-Group/events/234416779/ > > > Florida > Apache Spark (Miami) - Wednesday, October 26 > http://www.meetup.com/Miami-Hadoop-User-Group/events/234992451/ > > > New York > Lambda-in-a-Box: Merging Apache Spark & HBase into an Open-Source Database > (New York) - Thursday, October 27 > http://www.meetup.com/mysqlnyc/events/233775657/ > > October Data Engineering Meetup (New York) - Thursday, October 27 > http://www.meetup.com/NYC-Data-Engineering/events/234946410/ > > > CANADA > Toronto Apache Spark #14 (Toronto) - Wednesday, October 26 > http://www.meetup.com/Toronto-Apache-Spark/events/234878620/ > > Introduction to MapR (Toronto) - Thursday, October 27 > http://www.meetup.com/Toronto-MapR-User-Group/events/231648976/ > > > UNITED KINGDOM > Why SMACK for Fast Data (London) - Monday, October 24 > http://www.meetup.com/skillsmatter/events/234588911/ > > Building Scalable Systems in a Changing Data Landscape (London) - Tuesday, > October 25 > http://www.meetup.com/data-science-lab/events/234754144/ > > Spark Structured Streaming in Practice (London) - Wednesday, October 26 > http://www.meetup.com/hadoop-users-group-uk/events/234876912/ > > > SPAIN > Season Premiere with Reynold Xin, Co-Founder & Chief Architect at > Databricks (Barcelona) - Thursday, October 27 > http://www.meetup.com/Spark-Barcelona/events/234463208/ > > Introduction to Kafka (Malaga) - Friday, October 28 > http://www.meetup.com/Linux-Malaga/events/234826330/ > > > BELGIUM > Spark Pre-Summit Meetup (Brussels) - Tuesday, October 25 > http://www.meetup.com/Spark-Belgium/events/234234256/ > > Meeting on Streamsets, Datameer and Kudu (Kontich) - Tuesday, October 25 > http://www.meetup.com/Belgium-Cloudera-User-Group/events/234618841/ > > Spark & Machine Learning Meetup (Brussels) - Thursday, October 27 > http://www.meetup.com/Data-Science-Community-Meetup/events/234173917/ > > > INDIA > Introduction to Spark & Use Cases (Hyderabad) - Monday, October 24 > http://www.meetup.com/meetup-group-ytFpRTDs/events/234412261/ > > > AUSTRALIA > Rethink SQL for Big Data with Apache Drill (Barton) - Tuesday, October 25 > http://www.meetup.com/Canberra-Big-Data-Converged-SQL-NoSQL-and-Real-Time/ > events/233463561/ > > Spark Meetup October (Sydney) - Wednesday, October 26 > http://www.meetup.com/Sydney-Apache-Spark-User-Group/events/233723585/ > > Rethink SQL for Big Data with Apache Drill (Melbourne) - Thursday, October > 27 > http://www.meetup.com/Melbourne-Big-Data-Converged- > SQL-NoSQL-and-Real-Time/events/233463459/ > > > ESTONIA > Big Data: Spark and TensorFlow (Tallinn) - Monday, October 24 > http://www.meetup.com/Advanced-Java-Estonia/events/234612322/ > > > > > If you didn't receive this email directly, and you'd like to subscribe to > weekly emails please visit http://hadoopweekly.com > > ============================================== > You signed up for this email at hadoopweekly.com > > Unsubscribe [email protected] from this list: > http://hadoopweekly.us6.list-manage.com/unsubscribe?u= > c31415a60fb0bc4efbe86f45b&id=976fe003f4&e=b0d6d006e8&c=d7d5e262dd > > Our mailing address is: > Hadoop Weekly > PO Box 373 > Cranford, NJ 07016 > USA >
