Hi Michael, Hope below details will help you. 1. How should I configure carbon to get performance ? Please refer below link to optimize data loading performance in Carbon. *https://github.com/apache/carbondata/blob/master/docs/useful-tips-on-carbondata.md#configuration-for-optimizing-data-loading-performance-for-massive-data <https://github.com/apache/carbondata/blob/master/docs/useful-tips-on-carbondata.md#configuration-for-optimizing-data-loading-performance-for-massive-data>*
2. How to configure carbon.properties? PropertyValueDescription spark.driver.extraJavaOptions -Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties A string of extra JVM options to pass to the driver. For instance, GC settings or other logging. spark.executor.extraJavaOptions -Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties A string of extra JVM options to pass to executors. For instance, GC settings or other logging. *NOTE*: You can enter multiple values separated by space. On Tue, Apr 3, 2018 at 6:24 PM, Michael Shtelma <mshte...@gmail.com> wrote: > Hi Liang, > > Many thanks for your answer! > It has worked in this way. > I am wondering now, how should I configure carbon to get performance > comparable with parquet. > Now I am using default properties, actually no properties at all. > I have tried saving one table to carbon, and it took ages comparable to > parquet. > Should I configure somewhere number of writer threads or smth like this ? > I have started spark shell with local[*] option, so I have hoped, that > the write process will use all available cores, but this was not the > case. > It is looking, that only one or two cores are actively used. > > Another question: where can I place carbon.properties ? If I place it > to the same folder as spark-defaults.properties, will carbon > automatically use them? > > Best, > Michael > > > On Mon, Apr 2, 2018 at 8:53 AM, Liang Chen <chenliang6...@gmail.com> > wrote: > > Hi Michael > > > > Yes, it is very easy to save any spark data to carbondata. > > Just need to do small change based on your script, as below : > > myDF.write > > .format("carbondata") > > .option("tableName" "MyTable") > > .mode(SaveMode.Overwrite) > > .save() > > > > For more detail, you can refer to examples: > > https://github.com/apache/carbondata/blob/master/ > examples/spark2/src/main/scala/org/apache/carbondata/examples/ > CarbonDataFrameExample.scala > > > > > > HTH. > > > > Regards > > Liang > > > > > > 2018-03-31 18:15 GMT+08:00 Michael Shtelma <mshte...@gmail.com>: > > > >> Hi Team, > >> > >> I am new to CarbonData and wanted to test it using a couple of my test > >> queries. > >> In my test I have used CarbonData 1.3.1 and Spark 2.2.1. > >> > >> I have tried saving my data frame as carbon data table using the > >> following command : > >> > >> myDF.write.format("carbondata").mode("overwrite"). > saveAsTable("MyTable") > >> > >> As a result I have got the following exception: > >> > >> java.lang.IllegalArgumentException: requirement failed: 'path' should > >> not be specified, the path to store carbon file is the 'storePath' > >> specified when creating CarbonContext > >> > >> at scala.Predef$.require(Predef.scala:224) > >> > >> at org.apache.spark.sql.CarbonSource.createRelation( > >> CarbonSource.scala:90) > >> > >> at org.apache.spark.sql.execution.datasources. > DataSource.writeAndRead( > >> DataSource.scala:449) > >> > >> at org.apache.spark.sql.execution.command. > CreateDataSourceTableAsSelectC > >> ommand.saveDataIntoTable(createDataSourceTables.scala:217) > >> > >> at org.apache.spark.sql.execution.command. > CreateDataSourceTableAsSelectC > >> ommand.run(createDataSourceTables.scala:177) > >> > >> at org.apache.spark.sql.execution.command.ExecutedCommandExec. > >> sideEffectResult$lzycompute(commands.scala:58) > >> > >> at org.apache.spark.sql.execution.command.ExecutedCommandExec. > >> sideEffectResult(commands.scala:56) > >> > >> at org.apache.spark.sql.execution.command. > ExecutedCommandExec.doExecute( > >> commands.scala:74) > >> > >> at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > >> execute$1.apply(SparkPlan.scala:117) > >> > >> at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > >> execute$1.apply(SparkPlan.scala:117) > >> > >> at org.apache.spark.sql.execution.SparkPlan$$anonfun$ > >> executeQuery$1.apply(SparkPlan.scala:138) > >> > >> at org.apache.spark.rdd.RDDOperationScope$.withScope( > >> RDDOperationScope.scala:151) > >> > >> at org.apache.spark.sql.execution.SparkPlan. > >> executeQuery(SparkPlan.scala:135) > >> > >> at org.apache.spark.sql.execution.SparkPlan.execute( > SparkPlan.scala:116) > >> > >> at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute( > >> QueryExecution.scala:92) > >> > >> at org.apache.spark.sql.execution.QueryExecution. > >> toRdd(QueryExecution.scala:92) > >> > >> at org.apache.spark.sql.DataFrameWriter.runCommand( > >> DataFrameWriter.scala:609) > >> > >> at org.apache.spark.sql.DataFrameWriter.createTable( > >> DataFrameWriter.scala:419) > >> > >> at org.apache.spark.sql.DataFrameWriter.saveAsTable( > >> DataFrameWriter.scala:398) > >> > >> at org.apache.spark.sql.DataFrameWriter.saveAsTable( > >> DataFrameWriter.scala:354) > >> > >> ... 54 elided > >> > >> I am wondering now, if there is a way to save any spark data frame as > >> hive tables backed by carbon data format? > >> Am I doing smth wrong? > >> > >> Best, > >> Michael > >> > > > > > > > > -- > > Regards > > Liang >