[jira] [Commented] (YARN-6214) NullPointer Exception while querying timeline server API

2020-03-10 Thread Benjamin Kim (Jira)
[ https://issues.apache.org/jira/browse/YARN-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056315#comment-17056315 ] Benjamin Kim commented on YARN-6214: The root cause if one of the apps is in

[jira] [Commented] (YARN-6214) NullPointer Exception while querying timeline server API

2020-02-27 Thread Benjamin Kim (Jira)
[ https://issues.apache.org/jira/browse/YARN-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047125#comment-17047125 ] Benjamin Kim commented on YARN-6214: It happened to me,   {code:java} {"

[spyder] Spyder 3.3.1 in Anacanda Navigator 1.8.7 Autocomplete and Online Help are not working

2018-08-15 Thread Benjamin Kim
I just started a Data Science class where they use Spyder as the IDE. After installing the latest Anaconda on my Macbook Pro with High Sierra and updating Spyder to 3.3.1, I got Spyder to launch fine. But, when I try to get information about objects and methods (cmd-i), nothing comes up. Also,

Re: Append In-Place to S3

2018-06-07 Thread Benjamin Kim
ted correctly, if you're joining then overwrite otherwise only > append as it removes dups. > > I think, in this scenario, just change it to write.mode('overwrite') because > you're already reading the old data and your job would be done. > > > On Sat 2 Ju

Re: Zeppelin 0.8

2018-06-07 Thread Benjamin Kim
Can anyone tell me what the status is for 0.8 release? > On May 2, 2018, at 4:43 PM, Jeff Zhang wrote: > > > Yes, 0.8 will support spark 2.3 > > Benjamin Kim mailto:bbuil...@gmail.com>>于2018年5月3日周四 > 上午1:59写道: > Will Zeppelin 0.8 have Spark 2.3 support? >

Re: Credentials for JDBC

2018-06-07 Thread Benjamin Kim
Hi 종열, Can you show me how? Thanks, Ben > On Jun 6, 2018, at 10:32 PM, Jongyoul Lee wrote: > > We have a trick to get credential information from a credential page. I'll > take into it. > > On Thu, Jun 7, 2018 at 7:53 AM, Benjamin Kim <mailto:bbuil...@gmail.com&g

Credentials for JDBC

2018-06-06 Thread Benjamin Kim
I created a JDBC interpreter for AWS Athena, and it passes the access key as UID and secret key as PWD in the URL connection string. Does anyone know if I can setup each user to pass their own credentials in a, sort of, credentials file or config? Thanks, Ben

Re: Append In-Place to S3

2018-06-02 Thread Benjamin Kim
: > Benjamin, > > The append will append the "new" data to the existing data with removing > the duplicates. You would need to overwrite the file everytime if you need > unique values. > > Thanks, > Jayadeep > > On Fri, Jun 1, 2018 at 9:31 PM Benjamin Kim wrote: &

Append In-Place to S3

2018-06-01 Thread Benjamin Kim
I have a situation where I trying to add only new rows to an existing data set that lives in S3 as gzipped parquet files, looping and appending for each hour of the day. First, I create a DF from the existing data, then I use a query to create another DF with the data that is new. Here is the co

Re: Zeppelin 0.8

2018-05-02 Thread Benjamin Kim
Will Zeppelin 0.8 have Spark 2.3 support? > On Apr 30, 2018, at 1:27 AM, Rotem Herzberg > wrote: > > Thanks > > On Mon, Apr 30, 2018 at 11:16 AM, Jeff Zhang > wrote: > > I am preparing the RC for 0.8 > > > Rotem Herzberg >于201

Zeppelin 0.8 Release Date

2018-04-27 Thread Benjamin Kim
I would like to know what the tentative release date of Zeppelin 0.8 will be. I am waiting for Leaflet integration to easily chart markers and cluster them. Also, I am waiting for any improvements in the job scheduler, monitoring, and alerting, if any. Lastly, I am hoping for some integration wi

Re: Spark 2.2 Structured Streaming + Kinesis

2017-11-13 Thread Benjamin Kim
To add, we have a CDH 5.12 cluster with Spark 2.2 in our data center. On Mon, Nov 13, 2017 at 3:15 PM Benjamin Kim wrote: > Does anyone know if there is a connector for AWS Kinesis that can be used > as a source for Structured Streaming? > > Thanks. > >

Databricks Serverless

2017-11-13 Thread Benjamin Kim
I have a question about this. The documentation compares the concept similar to BigQuery. Does this mean that we will no longer need to deal with instances and just pay for execution duration and amount of data processed? I’m just curious about how this will be priced. Also, when will it be ready

Spark 2.2 Structured Streaming + Kinesis

2017-11-13 Thread Benjamin Kim
Does anyone know if there is a connector for AWS Kinesis that can be used as a source for Structured Streaming? Thanks.

Serverless ETL

2017-10-17 Thread Benjamin Kim
With AWS having Glue and GCE having Dataprep, is Databricks coming out with an equivalent or better? I know that Serverless is a new offering, but will it go farther with automatic data schema discovery, profiling, metadata storage, change triggering, joining, transform suggestions, etc.? Just cur

DMP/CDP Profile Store

2017-08-30 Thread Benjamin Kim
I was wondering has anyone worked on a DMP/CDP for storing user and customer profiles in Kudu. Each user will have their base ID's aka identity graph along with statistics based on their attributes along with tables for these attributes grouped by category. Please let me know what you think of my

Re: Configure Impala for Kudu on Separate Cluster

2017-08-18 Thread Benjamin Kim
Todd, I'll keep this in mind. This information will be useful. I'll try again. Thanks, Ben On Wed, Aug 16, 2017 at 4:32 PM Todd Lipcon wrote: > On Wed, Aug 16, 2017 at 6:16 AM, Benjamin Kim wrote: > >> Hi, >> >> I found 2 issues. First, network connection i

Re: Configure Impala for Kudu on Separate Cluster

2017-08-16 Thread Benjamin Kim
of the master at one > node from another master node. E.g., if using telnet, from > , in the command-line shell: > >telnet 7051 >(just substitute and with appropriate > hostnames/IP addresses). > > > > Best rega

Re: Configure Impala for Kudu on Separate Cluster

2017-08-15 Thread Benjamin Kim
ocked (eg an iptables REJECT rule) > > -Todd > > On Mon, Aug 14, 2017 at 10:36 PM, Benjamin Kim wrote: > >> Hi Todd, >> >> I tried to create a Kudu table using impala shell, and I got this error. >> >> create table my_first_table >> ( >> id bigint, &

Re: Configure Impala for Kudu on Separate Cluster

2017-08-15 Thread Benjamin Kim
ittedly pretty bad, but it basically means it's getting > "connection refused", indicating that either there is no master running on > that host or it has been blocked (eg an iptables REJECT rule) > > -Todd > > On Mon, Aug 14, 2017 at 10:36 PM, Benjamin Kim wrote: >

Re: Configure Impala for Kudu on Separate Cluster

2017-08-14 Thread Benjamin Kim
been tested much as far as I know, so I > wouldn't be surprised if there are issues with scheduling fragments given > lack of locality, etc, but I would expect it to at least "basically work". > (I've done it once or twice to copy data from one cluster to another) > &g

Configure Impala for Kudu on Separate Cluster

2017-08-14 Thread Benjamin Kim
Can someone help me with configuring Impala using Cloudera Manager for Kudu 1.4.0 on CDH 5.12.0? I cannot get it to connect using impala shell. Cheers, Ben

Re: Cloudera Spark 2.2

2017-08-04 Thread Benjamin Kim
Dhadoop.version=2.6.0-cdh5.10.1 -Phadoop-2.6 -Pvendor-repo -Pscala-2.10 >> -Psparkr -pl '!alluxio,!flink,!ignite,!lens,!cassandra,!bigquery,!scio' -e > > > You may needs additional steps depending which interpreters you use (like > R etc). > > > -- > Ruslan Dautkhan

Re: Cloudera Spark 2.2

2017-08-04 Thread Benjamin Kim
ink binaries are only available for official releases? > > > > -- > Ruslan Dautkhanov > > On Wed, Aug 2, 2017 at 4:41 PM, Benjamin Kim wrote: > >> Did you build Zeppelin or download the binary? >> >> On Wed, Aug 2, 2017 at 3:40 PM Ruslan Dautkhanov >&

Re: Cloudera Spark 2.2

2017-08-02 Thread Benjamin Kim
> On Wed, Aug 2, 2017 at 4:31 PM, Benjamin Kim wrote: > >> Does this work with Zeppelin 0.7.1? We an error when setting SPARK_HOME >> in zeppelin-env.sh to what you have below. >> >> On Wed, Aug 2, 2017 at 3:24 PM Ruslan Dautkhanov >> wrote: >> >>> Y

Re: Cloudera Spark 2.2

2017-08-02 Thread Benjamin Kim
;/___/ .__/\_,_/_/ /_/\_\ version 2.1.0.cloudera1 > /_/ > > > spark-submit and spark-shell are just shell script wrappers. > > > > -- > Ruslan Dautkhanov > > On Wed, Aug 2, 2017 at 10:22 AM, Benjamin Kim wrote: > >> According to the Zeppelin doc

Geo Map Charting

2017-08-02 Thread Benjamin Kim
Anyone every try to chart density clusters or heat maps onto a geo map of the earth in Zeppelin? Can it be done? Cheers, Ben

Re: Cloudera Spark 2.2

2017-08-02 Thread Benjamin Kim
ala 2.11? > Also Spark 2.2 now requires JDK8 I believe. > > > > -- > Ruslan Dautkhanov > > On Tue, Aug 1, 2017 at 6:26 PM, Benjamin Kim wrote: > >> Here is more. >> >> org.apache.zeppelin.interpreter.InterpreterException: WARNING: >> User-defined SP

Re: Cloudera Spark 2.2

2017-08-01 Thread Benjamin Kim
t is due to some classpath issue. I am not sure familiar with CDH, > please check whether spark of CDH include hadoop jar with it. > > > Benjamin Kim 于2017年8月2日周三 上午8:22写道: > >> Here is the error that was sent to me. >> >> org.apache.zeppelin.interpreter.I

Re: Cloudera Spark 2.2

2017-08-01 Thread Benjamin Kim
; >> >> What's the error you see in log ? >> >> >> Benjamin Kim 于2017年8月2日周三 上午8:18写道: >> >>> Has anyone configured Zeppelin 0.7.1 for Cloudera's release of Spark >>> 2.2? I can't get it to work. I downloaded the binary and set SPARK_HOME to >>> /opt/cloudera/parcels/SPARK2/lib/spark2. I must be missing something. >>> >>> Cheers, >>> Ben >>> >>

Cloudera Spark 2.2

2017-08-01 Thread Benjamin Kim
Has anyone configured Zeppelin 0.7.1 for Cloudera's release of Spark 2.2? I can't get it to work. I downloaded the binary and set SPARK_HOME to /opt/cloudera/parcels/SPARK2/lib/spark2. I must be missing something. Cheers, Ben

Glue-like Functionality

2017-07-08 Thread Benjamin Kim
Has anyone seen AWS Glue? I was wondering if there is something similar going to be built into Spark Structured Streaming? I like the Data Catalog idea to store and track any data source/destination. It profiles the data to derive the scheme and data types. Also, it does some sort-of automated s

Centos 7 Compatibility

2017-06-21 Thread Benjamin Kim
All, I’m curious to know if Zeppelin will work with CentOS 7. I don’t see it in the list of OS’s supported. Thanks, Ben

Re: Use SQL Script to Write Spark SQL Jobs

2017-06-12 Thread Benjamin Kim
Hi Bo, +1 for your project. I come from the world of data warehouses, ETL, and reporting analytics. There are many individuals who do not know or want to do any coding. They are content with ANSI SQL and stick to it. ETL workflows are also done without any coding using a drag-and-drop user inte

Re: Spark 2.1 and Hive Metastore

2017-04-09 Thread Benjamin Kim
> - Dan > > On Sun, Apr 9, 2017 at 11:13 AM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > I’m curious about if and when Spark SQL will ever remove its dependency on > Hive Metastore. Now that Spark 2.1’s SparkSession has superseded the need for > HiveContext, are

Spark 2.1 and Hive Metastore

2017-04-09 Thread Benjamin Kim
I’m curious about if and when Spark SQL will ever remove its dependency on Hive Metastore. Now that Spark 2.1’s SparkSession has superseded the need for HiveContext, are there plans for Spark to no longer use the Hive Metastore service with a “SparkSchema” service with a PostgreSQL, MySQL, etc.

Spark 2.1 and Hive Metastore

2017-04-09 Thread Benjamin Kim
I’m curious about if and when Spark SQL will ever remove its dependency on Hive Metastore. Now that Spark 2.1’s SparkSession has superseded the need for HiveContext, are there plans for Spark to no longer use the Hive Metastore service with a “SparkSchema” service with a PostgreSQL, MySQL, etc.

Re: Spark on Kudu Roadmap

2017-04-09 Thread Benjamin Kim
you may want to file a > JIRA to help track this feature. > > Mike > > > On Mon, Mar 27, 2017 at 11:55 AM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > Hi Mike, > > I believe what we are looking for is this below. It is an often request use > case

Re: Spark on Kudu Roadmap

2017-03-27 Thread Benjamin Kim
gt; Is there anything in particular you are looking for? > > Thanks, > Mike > > On Mon, Mar 27, 2017 at 9:48 AM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > Hi, > > Are there any plans for deeper integration with Spark especially Spark SQL? > Is ther

Spark on Kudu Roadmap

2017-03-27 Thread Benjamin Kim
Hi, Are there any plans for deeper integration with Spark especially Spark SQL? Is there a roadmap to look at, so I can know what to expect in the future? Cheers, Ben

Re: Kudu on top of Alluxio

2017-03-25 Thread Benjamin Kim
t; caching. Also I don't recall Tachyon providing POSIX semantics. > > Mike > > Sent from my iPhone > >> On Mar 25, 2017, at 9:50 AM, Benjamin Kim wrote: >> >> Hi, >> >> Does anyone know of a way to use AWS S3 or >

Kudu on top of Alluxio

2017-03-25 Thread Benjamin Kim
Hi, Does anyone know of a way to use AWS S3 or

Kudu on top of Alluxio

2017-03-24 Thread Benjamin Kim
Hi, Does anyone know of a way to use AWS S3 or Alluxio on top of AWS S3 as the storage layer for Kudu? Thanks, Ben

Security Roadmap

2017-03-18 Thread Benjamin Kim
I’m curious as to what security features we can expect coming in the near and far future for Kudu. If there is some documentation for this, please let me know. Cheers, Ben

Login/Logout Problem

2017-03-01 Thread Benjamin Kim
We are running into problems where users login and staying logged in. When they try to run JDBC queries or even opening a notebook, they get flickering in the browser where the green color dot next to the username turns red, then back to green, then back to red, etc. When it stops doing that, th

Zeppelin Service Install

2017-03-01 Thread Benjamin Kim
Anyone have installed Zeppelin onto a CentOS/RedHat server and made it into a service? I can’t seem to find the instructions on how to do this. Cheers, Ben

Re: Get S3 Parquet File

2017-02-24 Thread Benjamin Kim
de which needs you to update it once > again in 6 months because newer versions of SPARK now find it deprecated. > > > Regards, > Gourav Sengupta > > > > On Fri, Feb 24, 2017 at 7:18 AM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > Hi Gourav, > &g

Re: Get S3 Parquet File

2017-02-23 Thread Benjamin Kim
o Spark 2.0/2.1. > > And besides that would you not want to work on a platform which is at least > 10 times faster What would that be? > > Regards, > Gourav Sengupta > > On Thu, Feb 23, 2017 at 6:23 PM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > We are t

Re: Get S3 Parquet File

2017-02-23 Thread Benjamin Kim
can be > hidden and read from Input Params. > > Thanks, > Aakash. > > > On 23-Feb-2017 11:54 PM, "Benjamin Kim" <mailto:bbuil...@gmail.com>> wrote: > We are trying to use Spark 1.6 within CDH 5.7.1 to retrieve a 1.3GB Parquet > file from AWS S

Get S3 Parquet File

2017-02-23 Thread Benjamin Kim
We are trying to use Spark 1.6 within CDH 5.7.1 to retrieve a 1.3GB Parquet file from AWS S3. We can read the schema and show some data when the file is loaded into a DataFrame, but when we try to do some operations, such as count, we get this error below. com.cloudera.com.amazonaws.AmazonClien

Re: Parquet Gzipped Files

2017-02-14 Thread Benjamin Kim
ur vendor should use the parquet internal compression and not take a > parquet file and gzip it. > >> On 13 Feb 2017, at 18:48, Benjamin Kim wrote: >> >> We are receiving files from an outside vendor who creates a Parquet data >> file and Gzips it before delivery.

Parquet Gzipped Files

2017-02-13 Thread Benjamin Kim
We are receiving files from an outside vendor who creates a Parquet data file and Gzips it before delivery. Does anyone know how to Gunzip the file in Spark and inject the Parquet data into a DataFrame? I thought using sc.textFile or sc.wholeTextFiles would automatically Gunzip the file, but I’m

Remove dependence on HDFS

2017-02-11 Thread Benjamin Kim
Has anyone got some advice on how to remove the reliance on HDFS for storing persistent data. We have an on-premise Spark cluster. It seems like a waste of resources to keep adding nodes because of a lack of storage space only. I would rather add more powerful nodes due to the lack of processing

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
till getting the same error. Can you think of anything > else? > > Cheers, > Ben > > >> On Feb 2, 2017, at 11:06 AM, Asher Krim > <mailto:ak...@hubspot.com>> wrote: >> >> Ben, >> >> That looks like a scala version mismatch. Have you ch

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
did you see only scala 2.10.5 being pulled in? > > On Fri, Feb 3, 2017 at 12:33 PM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > Asher, > > It’s still the same. Do you have any other ideas? > > Cheers, > Ben > > >> On Feb 3, 2017, at 8:16 AM, A

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
to > check which version of the scala sdk your IDE is using > > Asher Krim > Senior Software Engineer > > > On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > Hi Asher, > > I modified the pom to be the same Spark (1.6.0), HBas

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
7;re seeing this locally, you might want to > check which version of the scala sdk your IDE is using > > Asher Krim > Senior Software Engineer > > On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim wrote: > > Hi Asher, > > I modified the pom to be the same Spark (1.6.0),

Re: HBase Spark

2017-02-02 Thread Benjamin Kim
her Krim wrote: > > Ben, > > That looks like a scala version mismatch. Have you checked your dep tree? > > Asher Krim > Senior Software Engineer > > > On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > Elek, > >

Re: HBase Spark

2017-02-02 Thread Benjamin Kim
ltSource.createRelation(HBaseRelation.scala:51) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) If you can please help, I would be grateful. Cheers, Ben > O

Re: HBase Spark

2017-01-31 Thread Benjamin Kim
Elek, If I cannot use the HBase Spark module, then I’ll give it a try. Thanks, Ben > On Jan 31, 2017, at 1:02 PM, Marton, Elek wrote: > > > I tested this one with hbase 1.2.4: > > https://github.com/hortonworks-spark/shc > > Marton > > On 01/31/2017 09:17 P

HBase Spark

2017-01-31 Thread Benjamin Kim
Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I tried to build it from source, but I cannot get it to work. Thanks, Ben - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: PostgreSQL JDBC Connections

2017-01-05 Thread Benjamin Kim
gt; meaningful way e.g. use SQL results to create a new drop down to drive the > next page etc… > > > >> On Jan 5, 2017, at 12:57 PM, Benjamin Kim wrote: >> >> We are getting “out of shared memory” errors when multiple users are running >> SQL queries agains

PostgreSQL JDBC Connections

2017-01-05 Thread Benjamin Kim
We are getting “out of shared memory” errors when multiple users are running SQL queries against our PostgreSQL DB either simultaneously or throughout the day. When this happens, Zeppelin 0.6.0 becomes unresponsive for any more SQL queries. It looks like this is being caused by too many locks be

Re: Merging Parquet Files

2016-12-22 Thread Benjamin Kim
seful. Thanks! 2016-12-23 7:01 GMT+09:00 Benjamin Kim : Has anyone tried to merge *.gz.parquet files before? I'm trying to merge them into 1 file after they are output from Spark. Doing a coalesce(1) on the Spark cluster will not work. It just does not have the resources to do it. I'm

Re: Merging Parquet Files

2016-12-22 Thread Benjamin Kim
wse/PARQUET-460> > > It seems parquet-tools allows merge small Parquet files into one. > > > Also, I believe there are command-line tools in Kite - > https://github.com/kite-sdk/kite <https://github.com/kite-sdk/kite> > > This might be useful. > > > Th

Merging Parquet Files

2016-12-22 Thread Benjamin Kim
Has anyone tried to merge *.gz.parquet files before? I'm trying to merge them into 1 file after they are output from Spark. Doing a coalesce(1) on the Spark cluster will not work. It just does not have the resources to do it. I'm trying to do it using the commandline and not use Spark. I will us

Re: Deep learning libraries for scala

2016-11-01 Thread Benjamin Kim
eed. But as it states deeper integration with (scala) is yet to be > developed. > Any thoughts on how to use tensorflow with scala ? Need to write wrappers I > think. > > > On Oct 19, 2016 7:56 AM, "Benjamin Kim" <mailto:bbuil...@gmail.com>> wrote: > On

Spark Streaming and Kinesis

2016-10-27 Thread Benjamin Kim
Has anyone worked with AWS Kinesis and retrieved data from it using Spark Streaming? I am having issues where it’s returning no data. I can connect to the Kinesis stream and describe using Spark. Is there something I’m missing? Are there specific IAM security settings needed? I just simply follo

Re: Deep learning libraries for scala

2016-10-19 Thread Benjamin Kim
On that note, here is an article that Databricks made regarding using Tensorflow in conjunction with Spark. https://databricks.com/blog/2016/01/25/deep-learning-with-apache-spark-and-tensorflow.html Cheers, Ben > On Oct 19, 2016, at 3:09 AM, Gourav Sengupta > wrote: > > while using Deep Lea

JDBC Connections

2016-10-18 Thread Benjamin Kim
We are using Zeppelin 0.6.0 as a self-service for our clients to query our PostgreSQL databases. We are noticing that the connections are not closing after each one of them are done. What is the normal operating procedure to have these connections close when idle? Our scope for the JDBC interpre

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Benjamin Kim
> table cache and expose it through the thriftserver. But you have to implement > the loading logic, it can be very simple to very complex depending on your > needs. > > > 2016-10-17 19:48 GMT+02:00 Benjamin Kim <mailto:bbuil...@gmail.com>>: > Is this techniq

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Benjamin Kim
terface in to the big data world > revolves around the JDBC/ODBC interface. So if you don’t have that piece as > part of your solution, you’re DOA w respect to Tableau. > > Have you considered Drill as your JDBC connection point? (YAAP: Yet another > Apache project) >

Re: Schema Normalization

2016-10-10 Thread Benjamin Kim
which Impala would help. Thanks, Ben > On Oct 10, 2016, at 4:46 PM, Todd Lipcon wrote: > > On Mon, Oct 10, 2016 at 4:44 PM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > Todd, > > We are not going crazy with normalization. Actually, we are only normalizing >

Re: Schema Normalization

2016-10-10 Thread Benjamin Kim
Impala's query planner would do a lot better job than > Spark's, given that we don't currently expose information on table sizes to > Spark and thus it's likely to do a poor job of join ordering. > > Hope that helps > > -Todd > > > On Fri,

Re: Inserting New Primary Keys

2016-10-10 Thread Benjamin Kim
Is there only one process adding rows? because this seems a little risky if > you have multiple threads doing that… > >> On Oct 8, 2016, at 1:43 PM, Benjamin Kim > <mailto:bbuil...@gmail.com>> wrote: >> >> Mich, >> >> After much searching, I

Re: Spark SQL Thriftserver with HBase

2016-10-09 Thread Benjamin Kim
ll provide an in-memory cache for interactive analytics. You > can put full tables in-memory with Hive using Ignite HDFS in-memory solution. > All this does only make sense if you do not use MR as an engine, the right > input format (ORC, parquet) and a recent Hive version. > >

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage o

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
aming specifics, there are at least 4 or 5 different implementations > of HBASE sources, each at varying level of development and different > requirements (HBASE release version, Kerberos support etc) > > > _ > From: Benjamin Kim mailto:bbuil...

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
e it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such &

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
experience with this! > > > _____ > From: Benjamin Kim mailto:bbuil...@gmail.com>> > Sent: Saturday, October 8, 2016 11:00 AM > Subject: Re: Spark SQL Thriftserver with HBase > To: Felix Cheung <mailto:felixcheun...@hotmail.com>> > Cc: m

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
Thrift Server (with USING, > http://spark.apache.org/docs/latest/sql-programming-guide.html#tab_sql_10 > <http://spark.apache.org/docs/latest/sql-programming-guide.html#tab_sql_10>). > > > _ > From: Benjamin Kim mailto:bbuil...@gmail.com>&

Re: Inserting New Primary Keys

2016-10-08 Thread Benjamin Kim
damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 8 Octo

Re: Spark SQL Thriftserver with HBase

2016-10-08 Thread Benjamin Kim
book.html#spark> > > And if you search you should find several alternative approaches. > > > > > > On Fri, Oct 7, 2016 at 7:56 AM -0700, "Benjamin Kim" <mailto:bbuil...@gmail.com>> wrote: > > Does anyone know if Spark can work with HBase tab

Inserting New Primary Keys

2016-10-08 Thread Benjamin Kim
I have a table with data already in it that has primary keys generated by the function monotonicallyIncreasingId. Now, I want to insert more data into it with primary keys that will auto-increment from where the existing data left off. How would I do this? There is no argument I can pass into th

Schema Normalization

2016-10-07 Thread Benjamin Kim
I would like to know if normalization techniques should or should not be necessary when modeling table schemas in Kudu. I read that a table with around 50 columns is ideal. This would mean a very wide table should be avoided. Thanks, Ben

Re: Kudu Command Line Client

2016-10-07 Thread Benjamin Kim
Todd, That works. Thanks, Ben > On Oct 7, 2016, at 5:03 PM, Todd Lipcon wrote: > > On Fri, Oct 7, 2016 at 5:01 PM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > Todd, > > I’m trying to use: > > kudu table list > > I get: > > Invalid argu

Re: Kudu Command Line Client

2016-10-07 Thread Benjamin Kim
> Hey Ben, > > Which command are you using? try adding --help, and it should give you a > usage statement. > > -Todd > > On Fri, Oct 7, 2016 at 4:12 PM, Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > Does anyone know how to use the new Kudu command lin

Kudu Command Line Client

2016-10-07 Thread Benjamin Kim
Does anyone know how to use the new Kudu command line client? It used to be kudu-admin, but that is no more. I keep being asked for the master_addresses. I tried different combinations to no avail. Can someone direct me to the documentation for it? Thanks, Ben

Spark SQL Thriftserver with HBase

2016-10-07 Thread Benjamin Kim
Does anyone know if Spark can work with HBase tables using Spark SQL? I know in Hive we are able to create tables on top of an underlying HBase table that can be accessed using MapReduce jobs. Can the same be done using HiveContext or SQLContext? We are trying to setup a way to GET and POST data

Re: RESTful Endpoint and Spark

2016-10-07 Thread Benjamin Kim
On Oct 6, 2016, at 4:27 PM, Benjamin Kim wrote: >> >> Has anyone tried to integrate Spark with a server farm of RESTful API >> endpoints or even HTTP web-servers for that matter? I know it’s typically >> done using a web farm as the presentation interface, then data flows thro

Re: Spark on Kudu

2016-10-06 Thread Benjamin Kim
ration_with_spark> > > On Tue, Sep 20, 2016 at 5:00 PM Benjamin Kim <mailto:bbuil...@gmail.com>> wrote: > I see that the API has changed a bit so my old code doesn’t work anymore. Can > someone direct me to some code samples? > > Thanks, > Ben > > >&g

RESTful Endpoint and Spark

2016-10-06 Thread Benjamin Kim
Has anyone tried to integrate Spark with a server farm of RESTful API endpoints or even HTTP web-servers for that matter? I know it’s typically done using a web farm as the presentation interface, then data flows through a firewall/router to direct calls to a JDBC listener that will SELECT, INSE

Re: Deep learning libraries for scala

2016-10-03 Thread Benjamin Kim
I got this email a while back in regards to this. Dear Spark users and developers, I have released version 1.0.0 of scalable-deeplearning package. This package is based on the implementation of artificial neural networks in Spark ML. It is intended for new Spark deep learning features that wer

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Benjamin Kim
gt; That sounds interesting, would love to learn more about it. > > Mitch: looks good. Lastly I would suggest you to think if you really need > multiple column families. > > On 4 Oct 2016 02:57, "Benjamin Kim" <mailto:bbuil...@gmail.com>> wrote: > Lately, I

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Benjamin Kim
COLUMN+CELL > Tesco PLC > column=stock_daily:close, timestamp=1475447365118, value=325.25 > Tesco PLC > column=stock_daily:high, timestamp=1475447365118, value=332.00 > Tesc

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-02 Thread Benjamin Kim
any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 1 October 2016 at 23:39, Benjamin Kim &

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-01 Thread Benjamin Kim
Mich, I know up until CDH 5.4 we had to add the HTrace jar to the classpath to make it work using the command below. But after upgrading to CDH 5.7, it became unnecessary. echo "/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar" >> /etc/spark/conf/classpath.txt Hope this helps.

Re: [ANNOUNCE] Apache Kudu 1.0.0 release

2016-09-21 Thread Benjamin Kim
I tried installing using Cloudera Manager and noticed that the documentation doesn’t state the URL to enter in the Parcel Settings. So, I just re-used the old one for the beta, but there is an annoying reminder that Kudu is still beta. Is there a new parcel URL that is not for the beta? Thanks,

Re: [ANNOUNCE] Apache Kudu 1.0.0 release

2016-09-21 Thread Benjamin Kim
I tried installing using Cloudera Manager and noticed that the documentation doesn’t state the URL to enter in the Parcel Settings. So, I just re-used the old one for the beta, but there is an annoying reminder that Kudu is still beta. Is there a new parcel URL that is not for the beta? Thanks,

Re: Spark on Kudu

2016-09-20 Thread Benjamin Kim
Thanks! > On Sep 20, 2016, at 3:02 PM, Jordan Birdsell > wrote: > > http://kudu.apache.org/docs/developing.html#_kudu_integration_with_spark > <http://kudu.apache.org/docs/developing.html#_kudu_integration_with_spark> > > On Tue, Sep 20, 2016 at 5:00 PM

  1   2   3   4   >