In your use case, your dedf need not to be a data frame. You could use SC.textFile().collect. Even better you can just read off a local file, as your file is very small, unless you are planning to use yarn cluster mode. On 26 Oct 2016 16:43, "Ajay Chander" <itsche...@gmail.com> wrote:
> Sean, thank you for making it clear. It was helpful. > > Regards, > Ajay > > On Wednesday, October 26, 2016, Sean Owen <so...@cloudera.com> wrote: > >> This usage is fine, because you are only using the HiveContext locally on >> the driver. It's applied in a function that's used on a Scala collection. >> >> You can't use the HiveContext or SparkContext in a distribution >> operation. It has nothing to do with for loops. >> >> The fact that they're serializable is misleading. It's there, I believe, >> because these objects may be inadvertently referenced in the closure of a >> function that executes remotely, yet doesn't use the context. The closure >> cleaner can't always remove this reference. The task would fail to >> serialize even though it doesn't use the context. You will find these >> objects serialize but then don't work if used remotely. >> >> The NPE you see is an unrelated cosmetic problem that was fixed in 2.0.1 >> IIRC. >> >> On Wed, Oct 26, 2016 at 4:28 AM Ajay Chander <itsche...@gmail.com> wrote: >> >>> Hi Everyone, >>> >>> I was thinking if I can use hiveContext inside foreach like below, >>> >>> object Test { >>> def main(args: Array[String]): Unit = { >>> >>> val conf = new SparkConf() >>> val sc = new SparkContext(conf) >>> val hiveContext = new HiveContext(sc) >>> >>> val dataElementsFile = args(0) >>> val deDF = >>> hiveContext.read.text(dataElementsFile).toDF("DataElement").coalesce(1).distinct().cache() >>> >>> def calculate(de: Row) { >>> val dataElement = de.getAs[String]("DataElement").trim >>> val df1 = hiveContext.sql("SELECT cyc_dt, supplier_proc_i, '" + >>> dataElement + "' as data_elm, " + dataElement + " as data_elm_val FROM >>> TEST_DB.TEST_TABLE1 ") >>> df1.write.insertInto("TEST_DB.TEST_TABLE1") >>> } >>> >>> deDF.collect().foreach(calculate) >>> } >>> } >>> >>> >>> I looked at >>> https://spark.apache.org/docs/1.6.0/api/scala/index.html#org.apache.spark.sql.hive.HiveContext >>> and I see it is extending SqlContext which extends Logging with >>> Serializable. >>> >>> Can anyone tell me if this is the right way to use it ? Thanks for your >>> time. >>> >>> Regards, >>> >>> Ajay >>> >>>