date:20180211

Unsubscribe

2018-02-11 Thread Sandeep Varma



Sandeep Varma
Principal
ZS Associates India Pvt. Ltd.
World Trade Center, Tower 3, Kharadi, Pune 411014, Maharashtra, India
T  |  +91 20 6739 5224  M  |  +91 97 6633 0103
www.zs.com

ZS  Impact where it matters.






Notice: This message, including attachments, may be confidential or privileged. 
If you are not the addressee, notify the sender immediately and delete this 
email from your system.

Re: Schema - DataTypes.NullType

2018-02-11 Thread Nicholas Hakobian

I spent a few minutes poking around in the source code and found this:

The data type representing None, used for the types that cannot be inferred.

https://github.com/apache/spark/blob/branch-2.1/python/pyspark/sql/types.py#L107-L113

Playing around a bit, this is the only use case that I could immediately
come up with; you have some type of a placeholder field already in data,
but its always null. If you let createDataFrame (and I bet other things
like DataFrameReader would behave similarly) try to infer it directly, it
will error out since it can't infer the schema automatically. Doing
something like below will allow the data to be used. And, if memory serves,
Hive has a concept of a Null data type also for these types of situations.

In [9]: df = spark.createDataFrame([Row(id=1, val=None), Row(id=2,
val=None)], schema=StructType([StructField('id', LongType()),
StructField('val', NullType())]))

In [10]: df.show()
+---++
| id| val|
+---++
|  1|null|
|  2|null|
+---++

In [11]: df.printSchema()
root
 |-- id: long (nullable = true)
 |-- val: null (nullable = true)

Nicholas Szandor Hakobian, Ph.D.
Staff Data Scientist
Rally Health
nicholas.hakob...@rallyhealth.com

On Sun, Feb 11, 2018 at 5:40 AM, Jean Georges Perrin  wrote:

> What is the purpose of DataTypes.NullType, specially as you are building a
> schema? Have anyone used it or seen it as spart of a schema auto-generation?
>
>
> (If I keep asking long enough, I may get an answer, no? :) )
>
>
> > On Feb 4, 2018, at 13:15, Jean Georges Perrin  wrote:
> >
> > Any taker on this one? ;)
> >
> >> On Jan 29, 2018, at 16:05, Jean Georges Perrin  wrote:
> >>
> >> Hi Sparkians,
> >>
> >> Can someone tell me what is the purpose of DataTypes.NullType,
> specially as you are building a schema?
> >>
> >> Thanks
> >>
> >> jg
> >> -
> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >>
> >
> >
> > -
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: optimize hive query to move a subset of data from one partition table to another table

2018-02-11 Thread amit kumar singh

Hi

create table emp as select * from emp_full where join_date
>=date_sub(join_date,2)

i am trying to select from one table insert into another table

i need a way to do select last 2 month of data everytime

table is partitioned on year month day

On Sun, Feb 11, 2018 at 4:30 PM, Richard Qiao 
wrote:

> Would you mind share your code with us to analyze?
>
> > On Feb 10, 2018, at 10:18 AM, amit kumar singh 
> wrote:
> >
> > Hi Team,
> >
> > We have hive external  table which has 50 tb of data partitioned on year
> month day
> >
> > i want to move last 2 month of data into another table
> >
> > when i try to do this through spark ,more than 120k task are getting
> created
> >
> > what is the best way to do this
> >
> > thanks
> > Rohit
>
>

Re: Unsubscribe

2018-02-11 Thread purna pradeep

Unsubscribe

Re: optimize hive query to move a subset of data from one partition table to another table

2018-02-11 Thread Richard Qiao

Would you mind share your code with us to analyze?

> On Feb 10, 2018, at 10:18 AM, amit kumar singh  wrote:
> 
> Hi Team,
> 
> We have hive external  table which has 50 tb of data partitioned on year 
> month day
> 
> i want to move last 2 month of data into another table
> 
> when i try to do this through spark ,more than 120k task are getting created
> 
> what is the best way to do this
> 
> thanks
> Rohit


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: saveAsTable does not respect spark.sql.warehouse.dir

2018-02-11 Thread Lian Jiang

Thanks guys. prashanth's idea worked for me. Appreciate very much!

On Sun, Feb 11, 2018 at 10:20 AM, prashanth t 
wrote:

> Hi Lian,
>
> Please add below command before creating table.
> "Use (database_name)"
> By default saveAsTable uses default database of hive. You might not have
> access to it that's causing problems.
>
>
>
> Thanks
> Prashanth Thipparthi
>
>
>
>
>
> On 11 Feb 2018 10:45 pm, "Lian Jiang"  wrote:
>
> I started spark-shell with below command:
>
> spark-shell --master yarn --conf spark.sql.warehouse.dir="/user/spark"
>
> In spark-shell, below statement can create a managed table using
> /user/spark HDFS folder:
>
> spark.sql("CREATE TABLE t5 (i int) USING PARQUET")
>
> However, below statements still use spark-warehouse in local folder like
> {currentfolder}/spark-warehouse.
>
> case class SymbolInfo(name: String, sector: String)
>
> val siDS = Seq(
>   SymbolInfo("AAPL", "IT"),
>   SymbolInfo("GOOG", "IT")
> ).toDS()
>
> siDS.write.saveAsTable("siDS")
>
> How can I make saveAsTable respect spark.sql.warehouse.dir when creating
> a managed table? Appreciate any help!
>
>
>

Unsubscribe

2018-02-11 Thread Archit Thakur

Unsubscribe

Re: saveAsTable does not respect spark.sql.warehouse.dir

2018-02-11 Thread prashanth t

Hi Lian,

Please add below command before creating table.
"Use (database_name)"
By default saveAsTable uses default database of hive. You might not have
access to it that's causing problems.



Thanks
Prashanth Thipparthi





On 11 Feb 2018 10:45 pm, "Lian Jiang"  wrote:

I started spark-shell with below command:

spark-shell --master yarn --conf spark.sql.warehouse.dir="/user/spark"

In spark-shell, below statement can create a managed table using
/user/spark HDFS folder:

spark.sql("CREATE TABLE t5 (i int) USING PARQUET")

However, below statements still use spark-warehouse in local folder like
{currentfolder}/spark-warehouse.

case class SymbolInfo(name: String, sector: String)

val siDS = Seq(
  SymbolInfo("AAPL", "IT"),
  SymbolInfo("GOOG", "IT")
).toDS()

siDS.write.saveAsTable("siDS")

How can I make saveAsTable respect spark.sql.warehouse.dir when creating a
managed table? Appreciate any help!

Re: Spark cannot find tables in Oracle database

2018-02-11 Thread Lian Jiang

Thanks Guys for help!

Georg's proposal fixed the issue. Thanks a lot.

On Sun, Feb 11, 2018 at 7:59 AM, Georg Heiler 
wrote:

> I had the same problem. You need to uppercase all tables prior to storing
> them in oracle.
> Gourav Sengupta  schrieb am So. 11. Feb. 2018
> um 10:44:
>
>> Hi,
>>
>> since you are using the same user as the schema, I do not think that
>> there is an access issue. Perhaps you might want to see whether there is
>> anything case sensitive about the the table names. I remember once that the
>> table names had to be in small letters, but that was in MYSQL.
>>
>>
>> Regards,
>> Gourav
>>
>> On Sun, Feb 11, 2018 at 2:26 AM, Lian Jiang 
>> wrote:
>>
>>> Hi,
>>>
>>> I am following https://spark.apache.org/docs/latest/sql-
>>> programming-guide.html#jdbc-to-other-databases to query oracle database
>>> 12.1 from spark shell 2.11.8.
>>>
>>> val jdbcDF = spark.read
>>>   .format("jdbc")
>>>   .option("url", "jdbc:oracle:thin:@(DESCRIPTION = (ADDRESS = (PROTOCOL = 
>>> TCP)(HOST = 129.106.123.73)(PORT = 1521))(CONNECT_DATA =(SERVER = 
>>> DEDICATED)(SERVICE_NAME = pdb1.subnet1.hadoop.oraclevcn.com)))")
>>>   .option("dbtable", "HADOOP_DEV.SYMBOLINFO")
>>>   .option("user", "hadoop_dev")
>>>   .option("password", "mypassword")
>>>   .load()
>>>
>>> This statement failed due to "ORA-00942: table or view does not exist"
>>> even SymbolInfo table does exist in hadoop_dev schema.
>>>
>>> Any clue? Thanks!
>>>
>>
>>

Re: Spark cannot find tables in Oracle database

2018-02-11 Thread Georg Heiler

I had the same problem. You need to uppercase all tables prior to storing
them in oracle.
Gourav Sengupta  schrieb am So. 11. Feb. 2018 um
10:44:

> Hi,
>
> since you are using the same user as the schema, I do not think that there
> is an access issue. Perhaps you might want to see whether there is anything
> case sensitive about the the table names. I remember once that the table
> names had to be in small letters, but that was in MYSQL.
>
>
> Regards,
> Gourav
>
> On Sun, Feb 11, 2018 at 2:26 AM, Lian Jiang  wrote:
>
>> Hi,
>>
>> I am following
>> https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases
>> to query oracle database 12.1 from spark shell 2.11.8.
>>
>> val jdbcDF = spark.read
>>   .format("jdbc")
>>   .option("url", "jdbc:oracle:thin:@(DESCRIPTION = (ADDRESS = (PROTOCOL = 
>> TCP)(HOST = 129.106.123.73)(PORT = 1521))(CONNECT_DATA =(SERVER = 
>> DEDICATED)(SERVICE_NAME = pdb1.subnet1.hadoop.oraclevcn.com)))")
>>   .option("dbtable", "HADOOP_DEV.SYMBOLINFO")
>>   .option("user", "hadoop_dev")
>>   .option("password", "mypassword")
>>   .load()
>>
>> This statement failed due to "ORA-00942: table or view does not exist"
>> even SymbolInfo table does exist in hadoop_dev schema.
>>
>> Any clue? Thanks!
>>
>
>

Re: Apache Spark - Structured Streaming Query Status - field descriptions

2018-02-11 Thread M Singh

Thanks Richard.  I am hoping that Spark team will at some time, provide more 
detailed documentation.
 

On Sunday, February 11, 2018 2:17 AM, Richard Qiao 
 wrote:
 

 Can find a good source for documents, but the source code 
“org.apache.spark.sql.execution.streaming.ProgressReporter” is helpful to 
answer some of them.
For example:  inputRowsPerSecond = numRecords / inputTimeSec,  
processedRowsPerSecond = numRecords / processingTimeSecThis is explaining why 
the 2 rowPerSec difference.

On Feb 10, 2018, at 8:42 PM, M Singh  wrote:
Hi:
I am working with spark 2.2.0 and am looking at the query status console 
output.  

My application reads from kafka - performs flatMapGroupsWithState and then 
aggregates the elements for two group counts.  The output is send to console 
sink.  I see the following output  (with my questions in bold). 

Please me know where I can find detailed description of the query status fields 
for spark structured streaming ?


StreamExecution: Streaming query made progress: {
  "id" : "8eff62a9-81a8-4142-b332-3e5ec63e06a2",
  "runId" : "21778fbb-406c-4c65-bdef-d9d2c24698ce",
  "name" : null,
  "timestamp" : "2018-02-11T01:18:00.005Z",
  "numInputRows" : 5780,
  "inputRowsPerSecond" : 96.32851690748795,    
  "processedRowsPerSecond" : 583.9563548191554,   // Why is the number of 
processedRowsPerSecond greater than inputRowsPerSecond ? Does this include 
shuffling/grouping ?
  "durationMs" : {
    "addBatch" : 9765,    // Is 
the time taken to get send output to all console output streams ? 
    "getBatch" : 3,   
// Is this time taken to get the batch from Kafka ?
    "getOffset" : 3,   
// Is this time for getting offset from Kafka ?
    "queryPlanning" : 89, // 
The value of this field changes with different triggers but the query is not 
changing so why does this change ?
    "triggerExecution" : 9898, // Is 
this total time for this trigger ?
    "walCommit" : 35 // Is 
this for checkpointing ?
  },
  "stateOperators" : [ {   // 
What are the two state operators ? I am assuming one is flatMapWthState (first 
one).
    "numRowsTotal" : 8,
    "numRowsUpdated" : 1
  }, {
    "numRowsTotal" : 6,    //Is 
this the group by state operator ?  If so, I have two group by so why do I see 
only one ?
    "numRowsUpdated" : 6
  } ],
  "sources" : [ {
    "description" : "KafkaSource[Subscribe[xyz]]",
    "startOffset" : {
  "xyz" : {
    "2" : 9183,
    "1" : 9184,
    "3" : 9184,
    "0" : 9183
  }
    },
    "endOffset" : {
  "xyz" : {
    "2" : 10628,
    "1" : 10629,
    "3" : 10629,
    "0" : 10628
  }
    },
    "numInputRows" : 5780,
    "inputRowsPerSecond" : 96.32851690748795,
    "processedRowsPerSecond" : 583.9563548191554
  } ],
  "sink" : {
    "description" : 
"org.apache.spark.sql.execution.streaming.ConsoleSink@15fc109c"
  }
}

Re: Schema - DataTypes.NullType

2018-02-11 Thread Jean Georges Perrin

What is the purpose of DataTypes.NullType, specially as you are building a 
schema? Have anyone used it or seen it as spart of a schema auto-generation?


(If I keep asking long enough, I may get an answer, no? :) )


> On Feb 4, 2018, at 13:15, Jean Georges Perrin  wrote:
> 
> Any taker on this one? ;)
> 
>> On Jan 29, 2018, at 16:05, Jean Georges Perrin  wrote:
>> 
>> Hi Sparkians,
>> 
>> Can someone tell me what is the purpose of DataTypes.NullType, specially as 
>> you are building a schema?
>> 
>> Thanks
>> 
>> jg
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> 
> 
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Apache Spark - Structured Streaming Query Status - field descriptions

2018-02-11 Thread Richard Qiao

Can find a good source for documents, but the source code 
“org.apache.spark.sql.execution.streaming.ProgressReporter” is helpful to 
answer some of them.

For example:
  inputRowsPerSecond = numRecords / inputTimeSec,
  processedRowsPerSecond = numRecords / processingTimeSec
This is explaining why the 2 rowPerSec difference.

> On Feb 10, 2018, at 8:42 PM, M Singh  wrote:
> 
> Hi:
> 
> I am working with spark 2.2.0 and am looking at the query status console 
> output.  
> 
> My application reads from kafka - performs flatMapGroupsWithState and then 
> aggregates the elements for two group counts.  The output is send to console 
> sink.  I see the following output  (with my questions in bold). 
> 
> Please me know where I can find detailed description of the query status 
> fields for spark structured streaming ?
> 
> 
> StreamExecution: Streaming query made progress: {
>   "id" : "8eff62a9-81a8-4142-b332-3e5ec63e06a2",
>   "runId" : "21778fbb-406c-4c65-bdef-d9d2c24698ce",
>   "name" : null,
>   "timestamp" : "2018-02-11T01:18:00.005Z",
>   "numInputRows" : 5780,
>   "inputRowsPerSecond" : 96.32851690748795,
>   "processedRowsPerSecond" : 583.9563548191554,   // Why is the number of 
> processedRowsPerSecond greater than inputRowsPerSecond ? Does this include 
> shuffling/grouping ?
>   "durationMs" : {
> "addBatch" : 9765,// 
> Is the time taken to get send output to all console output streams ? 
> "getBatch" : 3,   
> // Is this time taken to get the batch from Kafka ?
> "getOffset" : 3,  
>  // Is this time for getting offset from Kafka ?
> "queryPlanning" : 89, // 
> The value of this field changes with different triggers but the query is not 
> changing so why does this change ?
> "triggerExecution" : 9898, // Is 
> this total time for this trigger ?
> "walCommit" : 35 // 
> Is this for checkpointing ?
>   },
>   "stateOperators" : [ {   // 
> What are the two state operators ? I am assuming one is flatMapWthState 
> (first one).
> "numRowsTotal" : 8,
> "numRowsUpdated" : 1
>   }, {
> "numRowsTotal" : 6,//Is 
> this the group by state operator ?  If so, I have two group by so why do I 
> see only one ?
> "numRowsUpdated" : 6
>   } ],
>   "sources" : [ {
> "description" : "KafkaSource[Subscribe[xyz]]",
> "startOffset" : {
>   "xyz" : {
> "2" : 9183,
> "1" : 9184,
> "3" : 9184,
> "0" : 9183
>   }
> },
> "endOffset" : {
>   "xyz" : {
> "2" : 10628,
> "1" : 10629,
> "3" : 10629,
> "0" : 10628
>   }
> },
> "numInputRows" : 5780,
> "inputRowsPerSecond" : 96.32851690748795,
> "processedRowsPerSecond" : 583.9563548191554
>   } ],
>   "sink" : {
> "description" : 
> "org.apache.spark.sql.execution.streaming.ConsoleSink@15fc109c"
>   }
> }
> 
>

Re: Spark cannot find tables in Oracle database

2018-02-11 Thread Gourav Sengupta

Hi,

since you are using the same user as the schema, I do not think that there
is an access issue. Perhaps you might want to see whether there is anything
case sensitive about the the table names. I remember once that the table
names had to be in small letters, but that was in MYSQL.

Regards,
Gourav

On Sun, Feb 11, 2018 at 2:26 AM, Lian Jiang  wrote:

> Hi,
>
> I am following https://spark.apache.org/docs/latest/sql-
> programming-guide.html#jdbc-to-other-databases to query oracle database
> 12.1 from spark shell 2.11.8.
>
> val jdbcDF = spark.read
>   .format("jdbc")
>   .option("url", "jdbc:oracle:thin:@(DESCRIPTION = (ADDRESS = (PROTOCOL = 
> TCP)(HOST = 129.106.123.73)(PORT = 1521))(CONNECT_DATA =(SERVER = 
> DEDICATED)(SERVICE_NAME = pdb1.subnet1.hadoop.oraclevcn.com)))")
>   .option("dbtable", "HADOOP_DEV.SYMBOLINFO")
>   .option("user", "hadoop_dev")
>   .option("password", "mypassword")
>   .load()
>
> This statement failed due to "ORA-00942: table or view does not exist"
> even SymbolInfo table does exist in hadoop_dev schema.
>
> Any clue? Thanks!
>

Re: Spark cannot find tables in Oracle database

2018-02-11 Thread Jörn Franke

Maybe you do not have access to the table/view. Incase of a view it could be 
also that you do not have access to the underlying table.
Have you tried with another sql tool to access it?

> On 11. Feb 2018, at 03:26, Lian Jiang  wrote:
> 
> Hi,
> 
> I am following 
> https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases
>  to query oracle database 12.1 from spark shell 2.11.8.
> 
> val jdbcDF = spark.read
>   .format("jdbc")
>   .option("url", "jdbc:oracle:thin:@(DESCRIPTION = (ADDRESS = (PROTOCOL = 
> TCP)(HOST = 129.106.123.73)(PORT = 1521))(CONNECT_DATA =(SERVER = 
> DEDICATED)(SERVICE_NAME = pdb1.subnet1.hadoop.oraclevcn.com)))")
>   .option("dbtable", "HADOOP_DEV.SYMBOLINFO")
>   .option("user", "hadoop_dev")
>   .option("password", "mypassword")
>   .load()
> This statement failed due to "ORA-00942: table or view does not exist" even 
> SymbolInfo table does exist in hadoop_dev schema.
> 
> Any clue? Thanks!

Re: Spark Dataframe and HIVE

2018-02-11 Thread रविशंकर नायर

Hi,
So , is this a bug, or something I need to fix? If its our issue, how can
we fix? Please help.

Best,

On Sun, Feb 11, 2018 at 3:49 AM, Shmuel Blitz 
wrote:

> Your table is missing a "PARTITIONED BY " section.
>
> Spark 2.x save the partition information in the  TBLPROPERTIES  section.
>
>
> On Sun, Feb 11, 2018 at 10:41 AM, Deepak Sharma 
> wrote:
>
>> I can see its trying to read the parquet and failing while decompressing
>> using snappy:
>> parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecor
>> dReader.java:201)
>>
>> So the table looks good but this needs to be fixed before you can query
>> the data in hive.
>>
>> Thanks
>> Deepak
>>
>> On Sun, Feb 11, 2018 at 1:45 PM, ☼ R Nair (रविशंकर नायर) <
>> ravishankar.n...@gmail.com> wrote:
>>
>>> When I do that , and then do a select, full of errors. I think Hive
>>> table to read.
>>>
>>> select * from mine;
>>> OK
>>> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
>>> SLF4J: Defaulting to no-operation (NOP) logger implementation
>>> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
>>> further details.
>>> java.lang.reflect.InvocationTargetException
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>> ssorImpl.java:62)
>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>> thodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoade
>>> r.java:317)
>>> at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219)
>>> at org.xerial.snappy.Snappy.(Snappy.java:44)
>>> at parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDec
>>> ompressor.java:62)
>>> at parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBl
>>> ockedDecompressorStream.java:51)
>>> at java.io.DataInputStream.readFully(DataInputStream.java:195)
>>> at java.io.DataInputStream.readFully(DataInputStream.java:169)
>>> at parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesI
>>> nput.java:204)
>>> at parquet.column.impl.ColumnReaderImpl.readPageV1(ColumnReader
>>> Impl.java:557)
>>> at parquet.column.impl.ColumnReaderImpl.access$300(ColumnReader
>>> Impl.java:57)
>>> at parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImp
>>> l.java:516)
>>> at parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImp
>>> l.java:513)
>>> at parquet.column.page.DataPageV1.accept(DataPageV1.java:96)
>>> at parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderIm
>>> pl.java:513)
>>> at parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderI
>>> mpl.java:505)
>>> at parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImp
>>> l.java:607)
>>> at parquet.column.impl.ColumnReaderImpl.(ColumnReaderImpl
>>> .java:351)
>>> at parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(C
>>> olumnReadStoreImpl.java:66)
>>> at parquet.column.impl.ColumnReadStoreImpl.getColumnReader(Colu
>>> mnReadStoreImpl.java:61)
>>> at parquet.io.RecordReaderImplementation.(RecordReaderImp
>>> lementation.java:270)
>>> at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:134)
>>> at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:99)
>>> at parquet.filter2.compat.FilterCompat$NoOpFilter.accept(Filter
>>> Compat.java:154)
>>> at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:99)
>>> at parquet.hadoop.InternalParquetRecordReader.checkRead(Interna
>>> lParquetRecordReader.java:137)
>>> at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(Inte
>>> rnalParquetRecordReader.java:208)
>>> at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecor
>>> dReader.java:201)
>>> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReade
>>> rWrapper.(ParquetRecordReaderWrapper.java:122)
>>> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReade
>>> rWrapper.(ParquetRecordReaderWrapper.java:85)
>>> at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputForma
>>> t.getRecordReader(MapredParquetInputFormat.java:72)
>>> at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputForma
>>> tSplit.getRecordReader(FetchOperator.java:673)
>>> at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader
>>> (FetchOperator.java:323)
>>> at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(Fetc
>>> hOperator.java:445)
>>> at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOp
>>> erator.java:414)
>>> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
>>> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1670)
>>> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriv
>>> er.java:233)
>>> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>>> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>>> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver
>>> .java:736)
>>> at

Re: Spark Dataframe and HIVE

2018-02-11 Thread Shmuel Blitz

Your table is missing a "PARTITIONED BY " section.

Spark 2.x save the partition information in the  TBLPROPERTIES  section.


On Sun, Feb 11, 2018 at 10:41 AM, Deepak Sharma 
wrote:

> I can see its trying to read the parquet and failing while decompressing
> using snappy:
> parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecor
> dReader.java:201)
>
> So the table looks good but this needs to be fixed before you can query
> the data in hive.
>
> Thanks
> Deepak
>
> On Sun, Feb 11, 2018 at 1:45 PM, ☼ R Nair (रविशंकर नायर) <
> ravishankar.n...@gmail.com> wrote:
>
>> When I do that , and then do a select, full of errors. I think Hive table
>> to read.
>>
>> select * from mine;
>> OK
>> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
>> SLF4J: Defaulting to no-operation (NOP) logger implementation
>> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
>> further details.
>> java.lang.reflect.InvocationTargetException
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:62)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoade
>> r.java:317)
>> at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219)
>> at org.xerial.snappy.Snappy.(Snappy.java:44)
>> at parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDec
>> ompressor.java:62)
>> at parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBl
>> ockedDecompressorStream.java:51)
>> at java.io.DataInputStream.readFully(DataInputStream.java:195)
>> at java.io.DataInputStream.readFully(DataInputStream.java:169)
>> at parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesI
>> nput.java:204)
>> at parquet.column.impl.ColumnReaderImpl.readPageV1(ColumnReader
>> Impl.java:557)
>> at parquet.column.impl.ColumnReaderImpl.access$300(ColumnReader
>> Impl.java:57)
>> at parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImp
>> l.java:516)
>> at parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImp
>> l.java:513)
>> at parquet.column.page.DataPageV1.accept(DataPageV1.java:96)
>> at parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderIm
>> pl.java:513)
>> at parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderI
>> mpl.java:505)
>> at parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImp
>> l.java:607)
>> at parquet.column.impl.ColumnReaderImpl.(ColumnReaderImpl.java:351)
>> at parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(C
>> olumnReadStoreImpl.java:66)
>> at parquet.column.impl.ColumnReadStoreImpl.getColumnReader(Colu
>> mnReadStoreImpl.java:61)
>> at parquet.io.RecordReaderImplementation.(RecordReaderImp
>> lementation.java:270)
>> at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:134)
>> at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:99)
>> at parquet.filter2.compat.FilterCompat$NoOpFilter.accept(
>> FilterCompat.java:154)
>> at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:99)
>> at parquet.hadoop.InternalParquetRecordReader.checkRead(Interna
>> lParquetRecordReader.java:137)
>> at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(Inte
>> rnalParquetRecordReader.java:208)
>> at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecor
>> dReader.java:201)
>> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReade
>> rWrapper.(ParquetRecordReaderWrapper.java:122)
>> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReade
>> rWrapper.(ParquetRecordReaderWrapper.java:85)
>> at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputForma
>> t.getRecordReader(MapredParquetInputFormat.java:72)
>> at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputForma
>> tSplit.getRecordReader(FetchOperator.java:673)
>> at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader
>> (FetchOperator.java:323)
>> at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(Fetc
>> hOperator.java:445)
>> at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOp
>> erator.java:414)
>> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
>> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1670)
>> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriv
>> er.java:233)
>> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:62)
>> at

Re: Spark Dataframe and HIVE

2018-02-11 Thread Deepak Sharma

I can see its trying to read the parquet and failing while decompressing
using snappy:
parquet.hadoop.ParquetRecordReader.nextKeyValue(
ParquetRecordReader.java:201)

So the table looks good but this needs to be fixed before you can query the
data in hive.

Thanks
Deepak

On Sun, Feb 11, 2018 at 1:45 PM, ☼ R Nair (रविशंकर नायर) <
ravishankar.n...@gmail.com> wrote:

> When I do that , and then do a select, full of errors. I think Hive table
> to read.
>
> select * from mine;
> OK
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
> details.
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317)
> at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219)
> at org.xerial.snappy.Snappy.(Snappy.java:44)
> at parquet.hadoop.codec.SnappyDecompressor.decompress(
> SnappyDecompressor.java:62)
> at parquet.hadoop.codec.NonBlockedDecompressorStream.read(
> NonBlockedDecompressorStream.java:51)
> at java.io.DataInputStream.readFully(DataInputStream.java:195)
> at java.io.DataInputStream.readFully(DataInputStream.java:169)
> at parquet.bytes.BytesInput$StreamBytesInput.toByteArray(
> BytesInput.java:204)
> at parquet.column.impl.ColumnReaderImpl.readPageV1(
> ColumnReaderImpl.java:557)
> at parquet.column.impl.ColumnReaderImpl.access$300(
> ColumnReaderImpl.java:57)
> at parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:516)
> at parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:513)
> at parquet.column.page.DataPageV1.accept(DataPageV1.java:96)
> at parquet.column.impl.ColumnReaderImpl.readPage(
> ColumnReaderImpl.java:513)
> at parquet.column.impl.ColumnReaderImpl.checkRead(
> ColumnReaderImpl.java:505)
> at parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:607)
> at parquet.column.impl.ColumnReaderImpl.(ColumnReaderImpl.java:351)
> at parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(
> ColumnReadStoreImpl.java:66)
> at parquet.column.impl.ColumnReadStoreImpl.getColumnReader(
> ColumnReadStoreImpl.java:61)
> at parquet.io.RecordReaderImplementation.(
> RecordReaderImplementation.java:270)
> at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:134)
> at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:99)
> at parquet.filter2.compat.FilterCompat$NoOpFilter.
> accept(FilterCompat.java:154)
> at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:99)
> at parquet.hadoop.InternalParquetRecordReader.checkRead(
> InternalParquetRecordReader.java:137)
> at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(
> InternalParquetRecordReader.java:208)
> at parquet.hadoop.ParquetRecordReader.nextKeyValue(
> ParquetRecordReader.java:201)
> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<
> init>(ParquetRecordReaderWrapper.java:122)
> at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<
> init>(ParquetRecordReaderWrapper.java:85)
> at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.
> getRecordReader(MapredParquetInputFormat.java:72)
> at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.
> getRecordReader(FetchOperator.java:673)
> at org.apache.hadoop.hive.ql.exec.FetchOperator.
> getRecordReader(FetchOperator.java:323)
> at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(
> FetchOperator.java:445)
> at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(
> FetchOperator.java:414)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1670)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(
> CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in
> java.library.path
> at

Re: Spark Dataframe and HIVE

2018-02-11 Thread रविशंकर नायर

When I do that , and then do a select, full of errors. I think Hive table
to read.

select * from mine;
OK
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
details.
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317)
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219)
at org.xerial.snappy.Snappy.(Snappy.java:44)
at
parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:62)
at
parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at
parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:204)
at
parquet.column.impl.ColumnReaderImpl.readPageV1(ColumnReaderImpl.java:557)
at parquet.column.impl.ColumnReaderImpl.access$300(ColumnReaderImpl.java:57)
at parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:516)
at parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:513)
at parquet.column.page.DataPageV1.accept(DataPageV1.java:96)
at parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:513)
at parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:505)
at parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:607)
at parquet.column.impl.ColumnReaderImpl.(ColumnReaderImpl.java:351)
at
parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:66)
at
parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:61)
at
parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:270)
at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:134)
at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:99)
at
parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:99)
at
parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
at
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:208)
at
parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
at
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:122)
at
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:85)
at
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:673)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1670)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in
java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
at java.lang.Runtime.loadLibrary0(Runtime.java:870)
at java.lang.System.loadLibrary(System.java:1122)
at
org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52)
... 52 more
Exception in thread "main" org.xerial.snappy.SnappyError:
[FAILED_TO_LOAD_NATIVE_LIBRARY] null
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
at org.xerial.snappy.Snappy.(Snappy.java:44)
at
parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:62)
at

Re: Spark Dataframe and HIVE

2018-02-11 Thread रविशंकर नायर

Why did it matter?

So, are you saying Spark SQL cannot create tables in HIVE?

If I need to use HiveCOntext, how should I change my code?

Best,

On Sun, Feb 11, 2018 at 3:09 AM, Deepak Sharma 
wrote:

> I think this is the problem here.
> You created the table using the spark sql and not the hive sql context.
>
> Thanks
> Deepak
>
> On Sun, Feb 11, 2018 at 1:36 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> simple question have you created the table through spark sql or hive?
>>
>> I recall similar issues a while back.
>>
>> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>> //val sqlContext = new HiveContext(sc)
>> println ("\nStarted at"); spark.sql("SELECT FROM_unixtime(unix_timestamp(),
>> 'dd/MM/ HH:mm:ss.ss') ").collect.foreach(println)
>> //
>> // Need to create and populate target ORC table sales in database test in
>> Hive
>> //
>> HiveContext.sql("use test")
>> //
>> // Drop and create table
>> //
>> HiveContext.sql("DROP TABLE IF EXISTS test.dummy2")
>> var sqltext = ""
>> sqltext = """
>> CREATE TABLE test.dummy2
>>  (
>>  ID INT
>>, CLUSTERED INT
>>, SCATTERED INT
>>, RANDOMISED INT
>>, RANDOM_STRING VARCHAR(50)
>>, SMALL_VC VARCHAR(10)
>>, PADDING  VARCHAR(10)
>> )
>> --CLUSTERED BY (ID) INTO 256 BUCKETS
>> STORED AS ORC
>> TBLPROPERTIES ( "orc.compress"="SNAPPY",
>> "orc.create.index"="true",
>> "orc.bloom.filter.columns"="ID",
>> "orc.bloom.filter.fpp"="0.05",
>> "orc.stripe.size"="268435456",
>> "orc.row.index.stride"="1" )
>> """
>> HiveContext.sql(sqltext)
>> //
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 11 February 2018 at 07:54, ☼ R Nair (रविशंकर नायर) <
>> ravishankar.n...@gmail.com> wrote:
>>
>>> Hi,
>>> Here you go:
>>>
>>> hive> show create table mine;
>>> OK
>>> CREATE TABLE `mine`(
>>>   `policyid` int,
>>>   `statecode` string,
>>>   `socialid` string,
>>>   `county` string,
>>>   `eq_site_limit` decimal(10,2),
>>>   `hu_site_limit` decimal(10,2),
>>>   `fl_site_limit` decimal(10,2),
>>>   `fr_site_limit` decimal(10,2),
>>>   `tiv_2014` decimal(10,2),
>>>   `tiv_2015` decimal(10,2),
>>>   `eq_site_deductible` int,
>>>   `hu_site_deductible` int,
>>>   `fl_site_deductible` int,
>>>   `fr_site_deductible` int,
>>>   `latitude` decimal(6,6),
>>>   `longitude` decimal(6,6),
>>>   `line` string,
>>>   `construction` string,
>>>   `point_granularity` int)
>>> ROW FORMAT SERDE
>>>   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
>>> WITH SERDEPROPERTIES (
>>>   'path'='hdfs://localhost:8020/user/hive/warehouse/mine')
>>> STORED AS INPUTFORMAT
>>>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
>>> OUTPUTFORMAT
>>>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
>>> LOCATION
>>>   'file:/Users/ravishankarnair/spark-warehouse/mine'
>>> TBLPROPERTIES (
>>>   'spark.sql.sources.provider'='parquet',
>>>   'spark.sql.sources.schema.numParts'='1',
>>>   'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fi
>>> elds\":[{\"name\":\"policyid\",\"type\":\"integer\",\"nullab
>>> le\":true,\"metadata\":{\"name\":\"policyid\",\"scale\":
>>> 0}},{\"name\":\"statecode\",\"type\":\"string\",\"nullable\"
>>> :true,\"metadata\":{\"name\":\"statecode\",\"scale\":0}},{\"
>>> name\":\"Socialid\",\"type\":\"string\",\"nullable\":true,\"
>>> metadata\":{\"name\":\"Socialid\",\"scale\":0}},{\"
>>> name\":\"county\",\"type\":\"string\",\"nullable\":true,\"
>>> metadata\":{\"name\":\"county\",\"scale\":0}},{\"name\":\"
>>> eq_site_limit\",\"type\":\"decimal(10,2)\",\"nullable\":
>>> true,\"metadata\":{\"name\":\"eq_site_limit\",\"scale\":2}},
>>> {\"name\":\"hu_site_limit\",\"type\":\"decimal(10,2)\",\"
>>> nullable\":true,\"metadata\":{\"name\":\"hu_site_limit\",\"
>>> scale\":2}},{\"name\":\"fl_site_limit\",\"type\":\"
>>> decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\"
>>> fl_site_limit\",\"scale\":2}},{\"name\":\"fr_site_limit\",\"
>>> type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{
>>> \"name\":\"fr_site_limit\",\"scale\":2}},{\"name\":\"tiv_
>>> 2014\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"
>>> metadata\":{\"name\":\"tiv_2014\",\"scale\":2}},{\"name\"
>>> :\"tiv_2015\",\"type\":\"decimal(10,2)\",\"nullable\":
>>> true,\"metadata\":{\"name\":\"tiv_2015\",\"scale\":2}},{\"
>>> name\":\"eq_site_deductible\",\"type\":\"integer\",\"
>>>

Re: Spark Dataframe and HIVE

2018-02-11 Thread Deepak Sharma

There was a typo:
Instead of :
alter table mine set locations "hdfs://localhost:8020/user/
hive/warehouse/mine";

Use :
alter table mine set location "hdfs://localhost:8020/user/
hive/warehouse/mine";

On Sun, Feb 11, 2018 at 1:38 PM, Deepak Sharma 
wrote:

> Try this in hive:
> alter table mine set locations "hdfs://localhost:8020/user/
> hive/warehouse/mine";
>
> Thanks
> Deepak
>
> On Sun, Feb 11, 2018 at 1:24 PM, ☼ R Nair (रविशंकर नायर) <
> ravishankar.n...@gmail.com> wrote:
>
>> Hi,
>> Here you go:
>>
>> hive> show create table mine;
>> OK
>> CREATE TABLE `mine`(
>>   `policyid` int,
>>   `statecode` string,
>>   `socialid` string,
>>   `county` string,
>>   `eq_site_limit` decimal(10,2),
>>   `hu_site_limit` decimal(10,2),
>>   `fl_site_limit` decimal(10,2),
>>   `fr_site_limit` decimal(10,2),
>>   `tiv_2014` decimal(10,2),
>>   `tiv_2015` decimal(10,2),
>>   `eq_site_deductible` int,
>>   `hu_site_deductible` int,
>>   `fl_site_deductible` int,
>>   `fr_site_deductible` int,
>>   `latitude` decimal(6,6),
>>   `longitude` decimal(6,6),
>>   `line` string,
>>   `construction` string,
>>   `point_granularity` int)
>> ROW FORMAT SERDE
>>   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
>> WITH SERDEPROPERTIES (
>>   'path'='hdfs://localhost:8020/user/hive/warehouse/mine')
>> STORED AS INPUTFORMAT
>>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
>> OUTPUTFORMAT
>>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
>> LOCATION
>>   'file:/Users/ravishankarnair/spark-warehouse/mine'
>> TBLPROPERTIES (
>>   'spark.sql.sources.provider'='parquet',
>>   'spark.sql.sources.schema.numParts'='1',
>>   'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"
>> fields\":[{\"name\":\"policyid\",\"type\":\"integer\",\"
>> nullable\":true,\"metadata\":{\"name\":\"policyid\",\"scale\
>> ":0}},{\"name\":\"statecode\",\"type\":\"string\",\"
>> nullable\":true,\"metadata\":{\"name\":\"statecode\",\"
>> scale\":0}},{\"name\":\"Socialid\",\"type\":\"string\"
>> ,\"nullable\":true,\"metadata\":{\"name\":\"Socialid\",\"
>> scale\":0}},{\"name\":\"county\",\"type\":\"string\",\
>> "nullable\":true,\"metadata\":{\"name\":\"county\",\"scale\"
>> :0}},{\"name\":\"eq_site_limit\",\"type\":\"decimal(10,
>> 2)\",\"nullable\":true,\"metadata\":{\"name\":\"eq_
>> site_limit\",\"scale\":2}},{\"name\":\"hu_site_limit\",\"typ
>> e\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{\"
>> name\":\"hu_site_limit\",\"scale\":2}},{\"name\":\"fl_site_
>> limit\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"
>> metadata\":{\"name\":\"fl_site_limit\",\"scale\":2}},{\"
>> name\":\"fr_site_limit\",\"type\":\"decimal(10,2)\",\"nullab
>> le\":true,\"metadata\":{\"name\":\"fr_site_limit\",\"sca
>> le\":2}},{\"name\":\"tiv_2014\",\"type\":\"decimal(10,2)\",\
>> "nullable\":true,\"metadata\":{\"name\":\"tiv_2014\",\"
>> scale\":2}},{\"name\":\"tiv_2015\",\"type\":\"decimal(10,
>> 2)\",\"nullable\":true,\"metadata\":{\"name\":\"tiv_
>> 2015\",\"scale\":2}},{\"name\":\"eq_site_deductible\",\"
>> type\":\"integer\",\"nullable\":true,\"metadata\":{\"name\":
>> \"eq_site_deductible\",\"scale\":0}},{\"name\":\"hu_
>> site_deductible\",\"type\":\"integer\",\"nullable\":true,\"
>> metadata\":{\"name\":\"hu_site_deductible\",\"scale\":0}},{\
>> "name\":\"fl_site_deductible\",\"type\":\"integer\",\"
>> nullable\":true,\"metadata\":{\"name\":\"fl_site_deductible\
>> ",\"scale\":0}},{\"name\":\"fr_site_deductible\",\"type\":
>> \"integer\",\"nullable\":true,\"metadata\":{\"name\":\"fr_si
>> te_deductible\",\"scale\":0}},{\"name\":\"latitude\",\"type\
>> ":\"decimal(6,6)\",\"nullable\":true,\"metadata\":{\"name\":
>> \"latitude\",\"scale\":6}},{\"name\":\"longitude\",\"type\":
>> \"decimal(6,6)\",\"nullable\":true,\"metadata\":{\"name\":\"
>> longitude\",\"scale\":6}},{\"name\":\"line\",\"type\":\"
>> string\",\"nullable\":true,\"metadata\":{\"name\":\"line\",
>> \"scale\":0}},{\"name\":\"construction\",\"type\":\"
>> string\",\"nullable\":true,\"metadata\":{\"name\":\"
>> construction\",\"scale\":0}},{\"name\":\"point_granularity\"
>> ,\"type\":\"integer\",\"nullable\":true,\"metadata\":{
>> \"name\":\"point_granularity\",\"scale\":0}}]}',
>>   'transient_lastDdlTime'='1518335598')
>> Time taken: 0.13 seconds, Fetched: 35 row(s)
>>
>> On Sun, Feb 11, 2018 at 2:36 AM, Shmuel Blitz <
>> shmuel.bl...@similarweb.com> wrote:
>>
>>> Please run the following command, and paste the result:
>>> SHOW CREATE TABLE <>
>>>
>>> On Sun, Feb 11, 2018 at 7:56 AM, ☼ R Nair (रविशंकर नायर) <
>>> ravishankar.n...@gmail.com> wrote:
>>>
 No, No luck.

 Thanks

 On Sun, Feb 11, 2018 at 12:48 AM, Deepak Sharma 
 wrote:

> In hive cli:
> msck repair table 《table_name》;
>
> Thanks
> Deepak
>
> On Feb 11, 2018 11:14, "☼ R Nair (रविशंकर नायर)" <
> ravishankar.n...@gmail.com> wrote:
>
>> NO, can you pease explain the

Re: Spark Dataframe and HIVE

2018-02-11 Thread रविशंकर नायर

Sorry Mich. I did not create using an explicit create statement Instead I
used below:

//Created a data frame loading from MYSQL

passion_df.write.saveAsTable("default.mine")

After logging into HIVE, HIVE shows the table. But cannot select the data.

On Sun, Feb 11, 2018 at 3:08 AM, ☼ R Nair (रविशंकर नायर) <
ravishankar.n...@gmail.com> wrote:

> I have created it using Spark SQL. Then I want to retrieve from HIVE.
> Thats where the issue is. I can , still retrieve from Spark. No problems.
> Why HIVE is not giving me the data??
>
>
>
> On Sun, Feb 11, 2018 at 3:06 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> simple question have you created the table through spark sql or hive?
>>
>> I recall similar issues a while back.
>>
>> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>> //val sqlContext = new HiveContext(sc)
>> println ("\nStarted at"); spark.sql("SELECT FROM_unixtime(unix_timestamp(),
>> 'dd/MM/ HH:mm:ss.ss') ").collect.foreach(println)
>> //
>> // Need to create and populate target ORC table sales in database test in
>> Hive
>> //
>> HiveContext.sql("use test")
>> //
>> // Drop and create table
>> //
>> HiveContext.sql("DROP TABLE IF EXISTS test.dummy2")
>> var sqltext = ""
>> sqltext = """
>> CREATE TABLE test.dummy2
>>  (
>>  ID INT
>>, CLUSTERED INT
>>, SCATTERED INT
>>, RANDOMISED INT
>>, RANDOM_STRING VARCHAR(50)
>>, SMALL_VC VARCHAR(10)
>>, PADDING  VARCHAR(10)
>> )
>> --CLUSTERED BY (ID) INTO 256 BUCKETS
>> STORED AS ORC
>> TBLPROPERTIES ( "orc.compress"="SNAPPY",
>> "orc.create.index"="true",
>> "orc.bloom.filter.columns"="ID",
>> "orc.bloom.filter.fpp"="0.05",
>> "orc.stripe.size"="268435456",
>> "orc.row.index.stride"="1" )
>> """
>> HiveContext.sql(sqltext)
>> //
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 11 February 2018 at 07:54, ☼ R Nair (रविशंकर नायर) <
>> ravishankar.n...@gmail.com> wrote:
>>
>>> Hi,
>>> Here you go:
>>>
>>> hive> show create table mine;
>>> OK
>>> CREATE TABLE `mine`(
>>>   `policyid` int,
>>>   `statecode` string,
>>>   `socialid` string,
>>>   `county` string,
>>>   `eq_site_limit` decimal(10,2),
>>>   `hu_site_limit` decimal(10,2),
>>>   `fl_site_limit` decimal(10,2),
>>>   `fr_site_limit` decimal(10,2),
>>>   `tiv_2014` decimal(10,2),
>>>   `tiv_2015` decimal(10,2),
>>>   `eq_site_deductible` int,
>>>   `hu_site_deductible` int,
>>>   `fl_site_deductible` int,
>>>   `fr_site_deductible` int,
>>>   `latitude` decimal(6,6),
>>>   `longitude` decimal(6,6),
>>>   `line` string,
>>>   `construction` string,
>>>   `point_granularity` int)
>>> ROW FORMAT SERDE
>>>   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
>>> WITH SERDEPROPERTIES (
>>>   'path'='hdfs://localhost:8020/user/hive/warehouse/mine')
>>> STORED AS INPUTFORMAT
>>>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
>>> OUTPUTFORMAT
>>>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
>>> LOCATION
>>>   'file:/Users/ravishankarnair/spark-warehouse/mine'
>>> TBLPROPERTIES (
>>>   'spark.sql.sources.provider'='parquet',
>>>   'spark.sql.sources.schema.numParts'='1',
>>>   'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fi
>>> elds\":[{\"name\":\"policyid\",\"type\":\"integer\",\"nullab
>>> le\":true,\"metadata\":{\"name\":\"policyid\",\"scale\":
>>> 0}},{\"name\":\"statecode\",\"type\":\"string\",\"nullable\"
>>> :true,\"metadata\":{\"name\":\"statecode\",\"scale\":0}},{\"
>>> name\":\"Socialid\",\"type\":\"string\",\"nullable\":true,\"
>>> metadata\":{\"name\":\"Socialid\",\"scale\":0}},{\"
>>> name\":\"county\",\"type\":\"string\",\"nullable\":true,\"
>>> metadata\":{\"name\":\"county\",\"scale\":0}},{\"name\":\"
>>> eq_site_limit\",\"type\":\"decimal(10,2)\",\"nullable\":
>>> true,\"metadata\":{\"name\":\"eq_site_limit\",\"scale\":2}},
>>> {\"name\":\"hu_site_limit\",\"type\":\"decimal(10,2)\",\"
>>> nullable\":true,\"metadata\":{\"name\":\"hu_site_limit\",\"
>>> scale\":2}},{\"name\":\"fl_site_limit\",\"type\":\"
>>> decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\"
>>> fl_site_limit\",\"scale\":2}},{\"name\":\"fr_site_limit\",\"
>>> type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{
>>> \"name\":\"fr_site_limit\",\"scale\":2}},{\"name\":\"tiv_
>>> 2014\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"
>>> metadata\":{\"name\":\"tiv_2014\",\"scale\":2}},{\"name\"
>>>

Re: Spark Dataframe and HIVE

2018-02-11 Thread Deepak Sharma

I think this is the problem here.
You created the table using the spark sql and not the hive sql context.

Thanks
Deepak

On Sun, Feb 11, 2018 at 1:36 PM, Mich Talebzadeh 
wrote:

> simple question have you created the table through spark sql or hive?
>
> I recall similar issues a while back.
>
> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
> //val sqlContext = new HiveContext(sc)
> println ("\nStarted at"); spark.sql("SELECT FROM_unixtime(unix_timestamp(),
> 'dd/MM/ HH:mm:ss.ss') ").collect.foreach(println)
> //
> // Need to create and populate target ORC table sales in database test in
> Hive
> //
> HiveContext.sql("use test")
> //
> // Drop and create table
> //
> HiveContext.sql("DROP TABLE IF EXISTS test.dummy2")
> var sqltext = ""
> sqltext = """
> CREATE TABLE test.dummy2
>  (
>  ID INT
>, CLUSTERED INT
>, SCATTERED INT
>, RANDOMISED INT
>, RANDOM_STRING VARCHAR(50)
>, SMALL_VC VARCHAR(10)
>, PADDING  VARCHAR(10)
> )
> --CLUSTERED BY (ID) INTO 256 BUCKETS
> STORED AS ORC
> TBLPROPERTIES ( "orc.compress"="SNAPPY",
> "orc.create.index"="true",
> "orc.bloom.filter.columns"="ID",
> "orc.bloom.filter.fpp"="0.05",
> "orc.stripe.size"="268435456",
> "orc.row.index.stride"="1" )
> """
> HiveContext.sql(sqltext)
> //
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 11 February 2018 at 07:54, ☼ R Nair (रविशंकर नायर) <
> ravishankar.n...@gmail.com> wrote:
>
>> Hi,
>> Here you go:
>>
>> hive> show create table mine;
>> OK
>> CREATE TABLE `mine`(
>>   `policyid` int,
>>   `statecode` string,
>>   `socialid` string,
>>   `county` string,
>>   `eq_site_limit` decimal(10,2),
>>   `hu_site_limit` decimal(10,2),
>>   `fl_site_limit` decimal(10,2),
>>   `fr_site_limit` decimal(10,2),
>>   `tiv_2014` decimal(10,2),
>>   `tiv_2015` decimal(10,2),
>>   `eq_site_deductible` int,
>>   `hu_site_deductible` int,
>>   `fl_site_deductible` int,
>>   `fr_site_deductible` int,
>>   `latitude` decimal(6,6),
>>   `longitude` decimal(6,6),
>>   `line` string,
>>   `construction` string,
>>   `point_granularity` int)
>> ROW FORMAT SERDE
>>   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
>> WITH SERDEPROPERTIES (
>>   'path'='hdfs://localhost:8020/user/hive/warehouse/mine')
>> STORED AS INPUTFORMAT
>>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
>> OUTPUTFORMAT
>>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
>> LOCATION
>>   'file:/Users/ravishankarnair/spark-warehouse/mine'
>> TBLPROPERTIES (
>>   'spark.sql.sources.provider'='parquet',
>>   'spark.sql.sources.schema.numParts'='1',
>>   'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"
>> fields\":[{\"name\":\"policyid\",\"type\":\"integer\",\"
>> nullable\":true,\"metadata\":{\"name\":\"policyid\",\"scale\
>> ":0}},{\"name\":\"statecode\",\"type\":\"string\",\"
>> nullable\":true,\"metadata\":{\"name\":\"statecode\",\"
>> scale\":0}},{\"name\":\"Socialid\",\"type\":\"string\"
>> ,\"nullable\":true,\"metadata\":{\"name\":\"Socialid\",\"
>> scale\":0}},{\"name\":\"county\",\"type\":\"string\",\
>> "nullable\":true,\"metadata\":{\"name\":\"county\",\"scale\"
>> :0}},{\"name\":\"eq_site_limit\",\"type\":\"decimal(10,
>> 2)\",\"nullable\":true,\"metadata\":{\"name\":\"eq_
>> site_limit\",\"scale\":2}},{\"name\":\"hu_site_limit\",\"typ
>> e\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{\"
>> name\":\"hu_site_limit\",\"scale\":2}},{\"name\":\"fl_site_
>> limit\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"
>> metadata\":{\"name\":\"fl_site_limit\",\"scale\":2}},{\"
>> name\":\"fr_site_limit\",\"type\":\"decimal(10,2)\",\"nullab
>> le\":true,\"metadata\":{\"name\":\"fr_site_limit\",\"sca
>> le\":2}},{\"name\":\"tiv_2014\",\"type\":\"decimal(10,2)\",\
>> "nullable\":true,\"metadata\":{\"name\":\"tiv_2014\",\"
>> scale\":2}},{\"name\":\"tiv_2015\",\"type\":\"decimal(10,
>> 2)\",\"nullable\":true,\"metadata\":{\"name\":\"tiv_
>> 2015\",\"scale\":2}},{\"name\":\"eq_site_deductible\",\"
>> type\":\"integer\",\"nullable\":true,\"metadata\":{\"name\":
>> \"eq_site_deductible\",\"scale\":0}},{\"name\":\"hu_
>> site_deductible\",\"type\":\"integer\",\"nullable\":true,\"
>> metadata\":{\"name\":\"hu_site_deductible\",\"scale\":0}},{\
>> "name\":\"fl_site_deductible\",\"type\":\"integer\",\"
>> nullable\":true,\"metadata\":{\"name\":\"fl_site_deductible\
>>

Re: Spark Dataframe and HIVE

2018-02-11 Thread Deepak Sharma

Try this in hive:
alter table mine set locations "hdfs://localhost:8020/
user/hive/warehouse/mine";

Thanks
Deepak

On Sun, Feb 11, 2018 at 1:24 PM, ☼ R Nair (रविशंकर नायर) <
ravishankar.n...@gmail.com> wrote:

> Hi,
> Here you go:
>
> hive> show create table mine;
> OK
> CREATE TABLE `mine`(
>   `policyid` int,
>   `statecode` string,
>   `socialid` string,
>   `county` string,
>   `eq_site_limit` decimal(10,2),
>   `hu_site_limit` decimal(10,2),
>   `fl_site_limit` decimal(10,2),
>   `fr_site_limit` decimal(10,2),
>   `tiv_2014` decimal(10,2),
>   `tiv_2015` decimal(10,2),
>   `eq_site_deductible` int,
>   `hu_site_deductible` int,
>   `fl_site_deductible` int,
>   `fr_site_deductible` int,
>   `latitude` decimal(6,6),
>   `longitude` decimal(6,6),
>   `line` string,
>   `construction` string,
>   `point_granularity` int)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> WITH SERDEPROPERTIES (
>   'path'='hdfs://localhost:8020/user/hive/warehouse/mine')
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
>   'file:/Users/ravishankarnair/spark-warehouse/mine'
> TBLPROPERTIES (
>   'spark.sql.sources.provider'='parquet',
>   'spark.sql.sources.schema.numParts'='1',
>   'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",
> \"fields\":[{\"name\":\"policyid\",\"type\":\"integer\
> ",\"nullable\":true,\"metadata\":{\"name\":\"policyid\",\"scale\":0}},{\"
> name\":\"statecode\",\"type\":\"string\",\"nullable\":true,\
> "metadata\":{\"name\":\"statecode\",\"scale\":0}},{\"
> name\":\"Socialid\",\"type\":\"string\",\"nullable\":true,\"
> metadata\":{\"name\":\"Socialid\",\"scale\":0}},{\"
> name\":\"county\",\"type\":\"string\",\"nullable\":true,\"
> metadata\":{\"name\":\"county\",\"scale\":0}},{\"name\":\"
> eq_site_limit\",\"type\":\"decimal(10,2)\",\"nullable\":
> true,\"metadata\":{\"name\":\"eq_site_limit\",\"scale\":2}},
> {\"name\":\"hu_site_limit\",\"type\":\"decimal(10,2)\",\"
> nullable\":true,\"metadata\":{\"name\":\"hu_site_limit\",\"
> scale\":2}},{\"name\":\"fl_site_limit\",\"type\":\"
> decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\"
> fl_site_limit\",\"scale\":2}},{\"name\":\"fr_site_limit\",\"
> type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{
> \"name\":\"fr_site_limit\",\"scale\":2}},{\"name\":\"tiv_
> 2014\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"
> metadata\":{\"name\":\"tiv_2014\",\"scale\":2}},{\"name\"
> :\"tiv_2015\",\"type\":\"decimal(10,2)\",\"nullable\":
> true,\"metadata\":{\"name\":\"tiv_2015\",\"scale\":2}},{\"
> name\":\"eq_site_deductible\",\"type\":\"integer\",\"
> nullable\":true,\"metadata\":{\"name\":\"eq_site_deductible\
> ",\"scale\":0}},{\"name\":\"hu_site_deductible\",\"type\":
> \"integer\",\"nullable\":true,\"metadata\":{\"name\":\"hu_
> site_deductible\",\"scale\":0}},{\"name\":\"fl_site_
> deductible\",\"type\":\"integer\",\"nullable\":true,\"
> metadata\":{\"name\":\"fl_site_deductible\",\"scale\":0}
> },{\"name\":\"fr_site_deductible\",\"type\":\"
> integer\",\"nullable\":true,\"metadata\":{\"name\":\"fr_
> site_deductible\",\"scale\":0}},{\"name\":\"latitude\",\"
> type\":\"decimal(6,6)\",\"nullable\":true,\"metadata\":{
> \"name\":\"latitude\",\"scale\":6}},{\"name\":\"longitude\",
> \"type\":\"decimal(6,6)\",\"nullable\":true,\"metadata\":{
> \"name\":\"longitude\",\"scale\":6}},{\"name\":\"line\"
> ,\"type\":\"string\",\"nullable\":true,\"metadata\":{
> \"name\":\"line\",\"scale\":0}},{\"name\":\"construction\",\
> "type\":\"string\",\"nullable\":true,\"metadata\":{\"name\":
> \"construction\",\"scale\":0}},{\"name\":\"point_granularity\",\"type\":\"
> integer\",\"nullable\":true,\"metadata\":{\"name\":\"point_
> granularity\",\"scale\":0}}]}',
>   'transient_lastDdlTime'='1518335598')
> Time taken: 0.13 seconds, Fetched: 35 row(s)
>
> On Sun, Feb 11, 2018 at 2:36 AM, Shmuel Blitz  > wrote:
>
>> Please run the following command, and paste the result:
>> SHOW CREATE TABLE <>
>>
>> On Sun, Feb 11, 2018 at 7:56 AM, ☼ R Nair (रविशंकर नायर) <
>> ravishankar.n...@gmail.com> wrote:
>>
>>> No, No luck.
>>>
>>> Thanks
>>>
>>> On Sun, Feb 11, 2018 at 12:48 AM, Deepak Sharma 
>>> wrote:
>>>
 In hive cli:
 msck repair table 《table_name》;

 Thanks
 Deepak

 On Feb 11, 2018 11:14, "☼ R Nair (रविशंकर नायर)" <
 ravishankar.n...@gmail.com> wrote:

> NO, can you pease explain the command ? Let me try now.
>
> Best,
>
> On Sun, Feb 11, 2018 at 12:40 AM, Deepak Sharma  > wrote:
>
>> I am not sure about the exact issue bjt i see you are partioning
>> while writing from spark.
>> Did you tried msck repair on the table before reading it in hive ?
>>
>> Thanks
>> Deepak
>>
>> On Feb 11, 2018 11:06, "☼ R Nair

Re: Spark Dataframe and HIVE

2018-02-11 Thread रविशंकर नायर

I have created it using Spark SQL. Then I want to retrieve from HIVE. Thats
where the issue is. I can , still retrieve from Spark. No problems. Why
HIVE is not giving me the data??



On Sun, Feb 11, 2018 at 3:06 AM, Mich Talebzadeh 
wrote:

> simple question have you created the table through spark sql or hive?
>
> I recall similar issues a while back.
>
> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
> //val sqlContext = new HiveContext(sc)
> println ("\nStarted at"); spark.sql("SELECT FROM_unixtime(unix_timestamp(),
> 'dd/MM/ HH:mm:ss.ss') ").collect.foreach(println)
> //
> // Need to create and populate target ORC table sales in database test in
> Hive
> //
> HiveContext.sql("use test")
> //
> // Drop and create table
> //
> HiveContext.sql("DROP TABLE IF EXISTS test.dummy2")
> var sqltext = ""
> sqltext = """
> CREATE TABLE test.dummy2
>  (
>  ID INT
>, CLUSTERED INT
>, SCATTERED INT
>, RANDOMISED INT
>, RANDOM_STRING VARCHAR(50)
>, SMALL_VC VARCHAR(10)
>, PADDING  VARCHAR(10)
> )
> --CLUSTERED BY (ID) INTO 256 BUCKETS
> STORED AS ORC
> TBLPROPERTIES ( "orc.compress"="SNAPPY",
> "orc.create.index"="true",
> "orc.bloom.filter.columns"="ID",
> "orc.bloom.filter.fpp"="0.05",
> "orc.stripe.size"="268435456",
> "orc.row.index.stride"="1" )
> """
> HiveContext.sql(sqltext)
> //
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 11 February 2018 at 07:54, ☼ R Nair (रविशंकर नायर) <
> ravishankar.n...@gmail.com> wrote:
>
>> Hi,
>> Here you go:
>>
>> hive> show create table mine;
>> OK
>> CREATE TABLE `mine`(
>>   `policyid` int,
>>   `statecode` string,
>>   `socialid` string,
>>   `county` string,
>>   `eq_site_limit` decimal(10,2),
>>   `hu_site_limit` decimal(10,2),
>>   `fl_site_limit` decimal(10,2),
>>   `fr_site_limit` decimal(10,2),
>>   `tiv_2014` decimal(10,2),
>>   `tiv_2015` decimal(10,2),
>>   `eq_site_deductible` int,
>>   `hu_site_deductible` int,
>>   `fl_site_deductible` int,
>>   `fr_site_deductible` int,
>>   `latitude` decimal(6,6),
>>   `longitude` decimal(6,6),
>>   `line` string,
>>   `construction` string,
>>   `point_granularity` int)
>> ROW FORMAT SERDE
>>   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
>> WITH SERDEPROPERTIES (
>>   'path'='hdfs://localhost:8020/user/hive/warehouse/mine')
>> STORED AS INPUTFORMAT
>>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
>> OUTPUTFORMAT
>>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
>> LOCATION
>>   'file:/Users/ravishankarnair/spark-warehouse/mine'
>> TBLPROPERTIES (
>>   'spark.sql.sources.provider'='parquet',
>>   'spark.sql.sources.schema.numParts'='1',
>>   'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"
>> fields\":[{\"name\":\"policyid\",\"type\":\"integer\",\"
>> nullable\":true,\"metadata\":{\"name\":\"policyid\",\"scale\
>> ":0}},{\"name\":\"statecode\",\"type\":\"string\",\"
>> nullable\":true,\"metadata\":{\"name\":\"statecode\",\"
>> scale\":0}},{\"name\":\"Socialid\",\"type\":\"string\"
>> ,\"nullable\":true,\"metadata\":{\"name\":\"Socialid\",\"
>> scale\":0}},{\"name\":\"county\",\"type\":\"string\",\
>> "nullable\":true,\"metadata\":{\"name\":\"county\",\"scale\"
>> :0}},{\"name\":\"eq_site_limit\",\"type\":\"decimal(10,
>> 2)\",\"nullable\":true,\"metadata\":{\"name\":\"eq_
>> site_limit\",\"scale\":2}},{\"name\":\"hu_site_limit\",\"typ
>> e\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{\"
>> name\":\"hu_site_limit\",\"scale\":2}},{\"name\":\"fl_site_
>> limit\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"
>> metadata\":{\"name\":\"fl_site_limit\",\"scale\":2}},{\"
>> name\":\"fr_site_limit\",\"type\":\"decimal(10,2)\",\"nullab
>> le\":true,\"metadata\":{\"name\":\"fr_site_limit\",\"sca
>> le\":2}},{\"name\":\"tiv_2014\",\"type\":\"decimal(10,2)\",\
>> "nullable\":true,\"metadata\":{\"name\":\"tiv_2014\",\"
>> scale\":2}},{\"name\":\"tiv_2015\",\"type\":\"decimal(10,
>> 2)\",\"nullable\":true,\"metadata\":{\"name\":\"tiv_
>> 2015\",\"scale\":2}},{\"name\":\"eq_site_deductible\",\"
>> type\":\"integer\",\"nullable\":true,\"metadata\":{\"name\":
>> \"eq_site_deductible\",\"scale\":0}},{\"name\":\"hu_
>> site_deductible\",\"type\":\"integer\",\"nullable\":true,\"
>> metadata\":{\"name\":\"hu_site_deductible\",\"scale\":0}},{\
>> "name\":\"fl_site_deductible\",\"type\":\"integer\",\"
>>

Re: Spark Dataframe and HIVE

2018-02-11 Thread Mich Talebzadeh

simple question have you created the table through spark sql or hive?

I recall similar issues a while back.

val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
//val sqlContext = new HiveContext(sc)
println ("\nStarted at"); spark.sql("SELECT FROM_unixtime(unix_timestamp(),
'dd/MM/ HH:mm:ss.ss') ").collect.foreach(println)
//
// Need to create and populate target ORC table sales in database test in
Hive
//
HiveContext.sql("use test")
//
// Drop and create table
//
HiveContext.sql("DROP TABLE IF EXISTS test.dummy2")
var sqltext = ""
sqltext = """
CREATE TABLE test.dummy2
 (
 ID INT
   , CLUSTERED INT
   , SCATTERED INT
   , RANDOMISED INT
   , RANDOM_STRING VARCHAR(50)
   , SMALL_VC VARCHAR(10)
   , PADDING  VARCHAR(10)
)
--CLUSTERED BY (ID) INTO 256 BUCKETS
STORED AS ORC
TBLPROPERTIES ( "orc.compress"="SNAPPY",
"orc.create.index"="true",
"orc.bloom.filter.columns"="ID",
"orc.bloom.filter.fpp"="0.05",
"orc.stripe.size"="268435456",
"orc.row.index.stride"="1" )
"""
HiveContext.sql(sqltext)
//

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 11 February 2018 at 07:54, ☼ R Nair (रविशंकर नायर) <
ravishankar.n...@gmail.com> wrote:

> Hi,
> Here you go:
>
> hive> show create table mine;
> OK
> CREATE TABLE `mine`(
>   `policyid` int,
>   `statecode` string,
>   `socialid` string,
>   `county` string,
>   `eq_site_limit` decimal(10,2),
>   `hu_site_limit` decimal(10,2),
>   `fl_site_limit` decimal(10,2),
>   `fr_site_limit` decimal(10,2),
>   `tiv_2014` decimal(10,2),
>   `tiv_2015` decimal(10,2),
>   `eq_site_deductible` int,
>   `hu_site_deductible` int,
>   `fl_site_deductible` int,
>   `fr_site_deductible` int,
>   `latitude` decimal(6,6),
>   `longitude` decimal(6,6),
>   `line` string,
>   `construction` string,
>   `point_granularity` int)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> WITH SERDEPROPERTIES (
>   'path'='hdfs://localhost:8020/user/hive/warehouse/mine')
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
>   'file:/Users/ravishankarnair/spark-warehouse/mine'
> TBLPROPERTIES (
>   'spark.sql.sources.provider'='parquet',
>   'spark.sql.sources.schema.numParts'='1',
>   'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",
> \"fields\":[{\"name\":\"policyid\",\"type\":\"integer\
> ",\"nullable\":true,\"metadata\":{\"name\":\"policyid\",\"scale\":0}},{\"
> name\":\"statecode\",\"type\":\"string\",\"nullable\":true,\
> "metadata\":{\"name\":\"statecode\",\"scale\":0}},{\"
> name\":\"Socialid\",\"type\":\"string\",\"nullable\":true,\"
> metadata\":{\"name\":\"Socialid\",\"scale\":0}},{\"
> name\":\"county\",\"type\":\"string\",\"nullable\":true,\"
> metadata\":{\"name\":\"county\",\"scale\":0}},{\"name\":\"
> eq_site_limit\",\"type\":\"decimal(10,2)\",\"nullable\":
> true,\"metadata\":{\"name\":\"eq_site_limit\",\"scale\":2}},
> {\"name\":\"hu_site_limit\",\"type\":\"decimal(10,2)\",\"
> nullable\":true,\"metadata\":{\"name\":\"hu_site_limit\",\"
> scale\":2}},{\"name\":\"fl_site_limit\",\"type\":\"
> decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\"
> fl_site_limit\",\"scale\":2}},{\"name\":\"fr_site_limit\",\"
> type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{
> \"name\":\"fr_site_limit\",\"scale\":2}},{\"name\":\"tiv_
> 2014\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"
> metadata\":{\"name\":\"tiv_2014\",\"scale\":2}},{\"name\"
> :\"tiv_2015\",\"type\":\"decimal(10,2)\",\"nullable\":
> true,\"metadata\":{\"name\":\"tiv_2015\",\"scale\":2}},{\"
> name\":\"eq_site_deductible\",\"type\":\"integer\",\"
> nullable\":true,\"metadata\":{\"name\":\"eq_site_deductible\
> ",\"scale\":0}},{\"name\":\"hu_site_deductible\",\"type\":
> \"integer\",\"nullable\":true,\"metadata\":{\"name\":\"hu_
> site_deductible\",\"scale\":0}},{\"name\":\"fl_site_
> deductible\",\"type\":\"integer\",\"nullable\":true,\"
> metadata\":{\"name\":\"fl_site_deductible\",\"scale\":0}
> },{\"name\":\"fr_site_deductible\",\"type\":\"
> integer\",\"nullable\":true,\"metadata\":{\"name\":\"fr_
> site_deductible\",\"scale\":0}},{\"name\":\"latitude\",\"
> type\":\"decimal(6,6)\",\"nullable\":true,\"metadata\":{
> \"name\":\"latitude\",\"scale\":6}},{\"name\":\"longitude\",
> \"type\":\"decimal(6,6)\",\"nullable\":true,\"metadata\":{
> \"name\":\"longitude\",\"scale\":6}},{\"name\":\"line\"
>

Unsubscribe

Re: Schema - DataTypes.NullType

Re: optimize hive query to move a subset of data from one partition table to another table

Re: Unsubscribe

Re: optimize hive query to move a subset of data from one partition table to another table

Re: saveAsTable does not respect spark.sql.warehouse.dir

Unsubscribe

Re: saveAsTable does not respect spark.sql.warehouse.dir

Re: Spark cannot find tables in Oracle database

Re: Spark cannot find tables in Oracle database

Re: Apache Spark - Structured Streaming Query Status - field descriptions

Re: Schema - DataTypes.NullType

Re: Apache Spark - Structured Streaming Query Status - field descriptions

Re: Spark cannot find tables in Oracle database

Re: Spark cannot find tables in Oracle database

Re: Spark Dataframe and HIVE

Re: Spark Dataframe and HIVE

Re: Spark Dataframe and HIVE

Re: Spark Dataframe and HIVE

Re: Spark Dataframe and HIVE

Re: Spark Dataframe and HIVE

Re: Spark Dataframe and HIVE

Re: Spark Dataframe and HIVE

Re: Spark Dataframe and HIVE

Re: Spark Dataframe and HIVE

Re: Spark Dataframe and HIVE

26 matches

Site Navigation

Mail list logo

Footer information