Fwd: Hadoop Weekly #191

Josh Elser Sun, 23 Oct 2016 17:06:06 -0700

Congrats, the 1.0.0-incubating release was picked up by Hadoop Weekly :)
---------- Forwarded message ----------
From: "Hadoop Weekly" <[email protected]>
Date: Oct 23, 2016 19:21
Subject: Hadoop Weekly #191
To: <[email protected]>
Cc:


Hadoop Weekly
> Issue #191
> 23 October 2016
>
> This week's issue is short and sweet with a few technical posts, two
> interesting news articles, and several exciting releases (including Apache
> Kafka 0.10.1.0). With Spark Summit Europe this week, expect lots of great
> content in the next issue. And if you're attending, please send interesting
> slides/talks my way!
>
> Technical
> =======
>
> Cloudera's CDH supports intra-node disk balancing since version 5.8.2
> (it's also part of the 3.0.0 alpha Apache release). Using this feature, a
> data node can rebalance data blocks across disks using the `hdfs
> diskbalancer` command. This post describes how the tool works and shows how
> to run it.
>
> http://blog.cloudera.com/blog/2016/10/how-to-use-the-new-
> hdfs-intra-datanode-disk-balancer-in-apache-hadoop/
>
>
> This post demonstrates the capabilities of the spark.ml library by
> building a logistic regression model to predict malignancy of cases from
> the Wisconsin Diagnostic Breast Cancer data set. The example code covers
> parsing, exploring a dataset with built-in statistics, extracting features
> from the input dataset, training the model, and evaluating the model.
>
> https://www.mapr.com/blog/predicting-breast-cancer-
> using-apache-spark-machine-learning-logistic-regression
>
>
> The Amazon Big Data blog has a tutorial for running RStudio with sparklyr
> on EMR. Thanks to a bootstrap action, a cluster complete with RStudio
> running on the master, can be launched with a single command.
>
> https://aws.amazon.com/blogs/big-data/running-sparklyr-
> rstudios-r-interface-to-spark-on-amazon-emr/
>
>
> The Databricks blog features a list of seven tips for debugging Apache
> Spark code on Databricks. Most of the suggestions, like "Scale up Spark
> jobs slowly for really large datasets" and "Examine the partitioning for
> your dataset," are generally applicable to all Spark users.
>
> https://databricks.com/blog/2016/10/18/7-tips-to-debug-
> apache-spark-code-faster-with-databricks.html
>
>
> News
> ====
>
> InfoQ has an interview with Yahoo VP of Engineering, Peter Cnudde. Topics
> covered include Hadoop, Spark adoption at Yahoo (mostly for in-memory
> computing, not for ETL), and Caffe-on-Spark for deep learning.
>
> https://www.infoq.com/articles/peter-cnudde-yahoo-big-data
>
>
> ZDNet contributor Tony Baer has read between the lines when it comes to
> recent benchmarks by Cloudera and Hortonworks. The takeaways are as
> follows: 1) "SQL's the gateway drug to Hadoop." 2) Cloudera is trying to
> challenge Amazon (in this case Redshift), and 3) Hortonworks (via Hive's
> Live Long and Prosper) has caught up on the investment Cloudera made in
> Impala.
>
> http://www.zdnet.com/article/sql-on-hadoop-benchmarks-get-serious/
>
>
> Releases
> =======
>
> Apache Kafka 0.10.1.0 was released this week. It contains improvements
> from over 500 pull requests and the implementation of 15 Kafka Improvement
> Proposals. The Confluent blog has the highlights of additions/improvements
> to Kafka Server (time-based indexes, replication quotas, and improved log
> compaction), improvements to Kafka client APIs (interactive queries for
> Kafak Streams, improved memory management, secure quotas, and more), and
> bug fixes.
>
> http://mail-archives.apache.org/mod_mbox/kafka-users/
> 201610.mbox/%3CCAJL4t_oz9q4T9vn6Z-EBoazWJFyqHw4Y0L-
> PTowD%2BpFhcPv0VQ%40mail.gmail.com%3E
> http://www.confluent.io/blog/announcing-apache-kafka-0-10-1-0/
>
> Apache Fluo (incubating), recently had its first release since entering
> the incubator. Fluo is a tool for making "incremental updates to large data
> sets stored in Apache Accumulo" a la Google's Perculator.
>
> https://fluo.apache.org/release/fluo-1.0.0-incubating/
>
>
> Apache Flume 1.7.0 was released. It adds support for a `taildir` source
> and includes a number of improvements and bug fixes. Many of these are
> around Flume's integration with Apache Kafka.
>
> http://flume.apache.org/releases/1.7.0.html
>
>
> Apache NiFi 0.7.1 was released as a follow-up to July's 0.7.0 release
> (version 1.0.0 was also recently released—in August). This release adds a
> number of improvements and bug fixes.
>
> https://cwiki.apache.org/confluence/display/NIFI/
> Release+Notes#ReleaseNotes-Version0.7.1
>
>
> Apache Giraph 1.2.0 was released. Highlight's of the release include a new
> blocks API, support for graphs that don't fit in memory, and the addition
> of a new set of default configuration options based on Facebook's
> experience with Giraph.
>
> https://blogs.apache.org/giraph/entry/giraph_1_2_0_release
>
>
> `deeplearning4j` is a deep learning implementation that integrates with
> Hadoop and Spark and supports GPUs. Version 0.6.0 was recently released.
>
> https://github.com/deeplearning4j/deeplearning4j
>
>
> Events
> =====
> Curated by Datadog ( http://www.datadog.com )
> UNITED STATES
>
> California
> Uber Engineering Tech Talk Series (San Francisco) - Monday, October 24
> http://www.meetup.com/UberEvents/events/234789134/
>
> Real-Time Streaming and Exactly-Once Semantics with Kafka (San Francisco)
> - Tuesday, October 25
> http://www.meetup.com/MemSQL/events/234405914/
>
> Building Your First Spark & C* App + SMACK Stack + The Cassandra Odyssey
> (San Francisco) - Wednesday, October 26
> http://www.meetup.com/SF-Spark-and-Friends/events/234932979/
>
> Apache YARN Committers/Contributors Meetup #4 (Sunnyvale) - Thursday,
> October 27
> http://www.meetup.com/Hadoop-Contributors/events/234971372/
>
>
> Washington
> Kafka Palooza: LinkedIn, Microsoft Azure, MapR (Bellevue) - Monday,
> October 24
> http://www.meetup.com/Seattle-Apache-Kafka-Meetup/events/234836624/
>
>
> Nevada
> PixieDust: Making Python Visualizations Easier for Jupyter Notebooks with
> Spark (Las Vegas) - Monday, October 24
> http://www.meetup.com/Data-Science-Las-Vegas/events/234557659/
>
>
> Texas
> O&G Big Data Use Cases, by Hortonworks (Houston) - Thursday, October 27
> http://www.meetup.com/Houston-Hadoop-Meetup-Group/events/234282996/
>
>
> Kansas
> Using Data Quality to Support Analytics in Hadoop (Overland Park) -
> Tuesday, October 25
> http://www.meetup.com/Kansas-City-Big-Data-Projects-Group/
> events/234597551/
>
>
> Missouri
> Using Data Quality to Support Analytics in Hadoop (Kansas City) - Tuesday,
> October 25
> http://www.meetup.com/Kansas-City-Big-Data-Projects-Group/
> events/234597347/
>
>
> Illinois
> Big Data Streaming Platform Ecosystem (Chicago) - Tuesday, October 25
> http://www.meetup.com/ChicagoRealTimeStreamingAnalytics/events/234676872/
>
> Apache Spark 101 (Chicago) - Tuesday, October 25
> http://www.meetup.com/Chicago-Spark-Users/events/233999667/
>
>
> Ohio
> October Edition of MOHUG (Dublin) - Tuesday, October 25
> http://www.meetup.com/MOHUG-Mid-Ohio-Hadoop-User-Group/events/234416779/
>
>
> Florida
> Apache Spark (Miami) - Wednesday, October 26
> http://www.meetup.com/Miami-Hadoop-User-Group/events/234992451/
>
>
> New York
> Lambda-in-a-Box: Merging Apache Spark & HBase into an Open-Source Database
> (New York) - Thursday, October 27
> http://www.meetup.com/mysqlnyc/events/233775657/
>
> October Data Engineering Meetup (New York) - Thursday, October 27
> http://www.meetup.com/NYC-Data-Engineering/events/234946410/
>
>
> CANADA
> Toronto Apache Spark #14 (Toronto) - Wednesday, October 26
> http://www.meetup.com/Toronto-Apache-Spark/events/234878620/
>
> Introduction to MapR (Toronto) - Thursday, October 27
> http://www.meetup.com/Toronto-MapR-User-Group/events/231648976/
>
>
> UNITED KINGDOM
> Why SMACK for Fast Data (London) - Monday, October 24
> http://www.meetup.com/skillsmatter/events/234588911/
>
> Building Scalable Systems in a Changing Data Landscape (London) - Tuesday,
> October 25
> http://www.meetup.com/data-science-lab/events/234754144/
>
> Spark Structured Streaming in Practice (London) - Wednesday, October 26
> http://www.meetup.com/hadoop-users-group-uk/events/234876912/
>
>
> SPAIN
> Season Premiere with Reynold Xin, Co-Founder & Chief Architect at
> Databricks (Barcelona) - Thursday, October 27
> http://www.meetup.com/Spark-Barcelona/events/234463208/
>
> Introduction to Kafka (Malaga) - Friday, October 28
> http://www.meetup.com/Linux-Malaga/events/234826330/
>
>
> BELGIUM
> Spark Pre-Summit Meetup (Brussels) - Tuesday, October 25
> http://www.meetup.com/Spark-Belgium/events/234234256/
>
> Meeting on Streamsets, Datameer and Kudu (Kontich) - Tuesday, October 25
> http://www.meetup.com/Belgium-Cloudera-User-Group/events/234618841/
>
> Spark & Machine Learning Meetup (Brussels) - Thursday, October 27
> http://www.meetup.com/Data-Science-Community-Meetup/events/234173917/
>
>
> INDIA
> Introduction to Spark & Use Cases (Hyderabad) - Monday, October 24
> http://www.meetup.com/meetup-group-ytFpRTDs/events/234412261/
>
>
> AUSTRALIA
> Rethink SQL for Big Data with Apache Drill (Barton) - Tuesday, October 25
> http://www.meetup.com/Canberra-Big-Data-Converged-SQL-NoSQL-and-Real-Time/
> events/233463561/
>
> Spark Meetup October (Sydney) - Wednesday, October 26
> http://www.meetup.com/Sydney-Apache-Spark-User-Group/events/233723585/
>
> Rethink SQL for Big Data with Apache Drill (Melbourne) - Thursday, October
> 27
> http://www.meetup.com/Melbourne-Big-Data-Converged-
> SQL-NoSQL-and-Real-Time/events/233463459/
>
>
> ESTONIA
> Big Data: Spark and TensorFlow (Tallinn) - Monday, October 24
> http://www.meetup.com/Advanced-Java-Estonia/events/234612322/
>
>
>
>
> If you didn't receive this email directly, and you'd like to subscribe to
> weekly emails please visit http://hadoopweekly.com
>
> ==============================================
> You signed up for this email at hadoopweekly.com
>
> Unsubscribe [email protected] from this list:
> http://hadoopweekly.us6.list-manage.com/unsubscribe?u=
> c31415a60fb0bc4efbe86f45b&id=976fe003f4&e=b0d6d006e8&c=d7d5e262dd
>
> Our mailing address is:
> Hadoop Weekly
> PO Box 373
> Cranford, NJ 07016
> USA
>

Fwd: Hadoop Weekly #191

Reply via email to