Re: Upgrade the scala code using the most updated Spark version

Marco Mistroni Tue, 28 Mar 2017 14:03:12 -0700

Hello
 uhm ihave a project whose build,sbt is closest to yours, where i am using
spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it works fine
in my projects though i don thave any of the following libraries that you
mention
- breeze
- netlib,all
-  scoopt


hth

On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi <anahita.t.am...@gmail.com>
wrote:

> Hi,
>
> Thanks for your answer.
>
> I first changed the scala version to 2.11.8 and kept the spark version
> 1.5.2 (old version). Then I changed the scalatest version into "3.0.1".
> With this configuration, I could run the code and compile it and generate
> the .jar file.
>
> When I changed the spark version into 2.1.0, I get the same error as
> before. So I imagine the problem should be somehow related to the version
> of spark.
>
> Cheers,
> Anahita
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> --------------------------------
> import AssemblyKeys._
>
> assemblySettings
>
> name := "proxcocoa"
>
> version := "0.1"
>
> organization := "edu.berkeley.cs.amplab"
>
> scalaVersion := "2.11.8"
>
> parallelExecution in Test := false
>
> {
>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>   libraryDependencies ++= Seq(
>     "org.slf4j" % "slf4j-api" % "1.7.2",
>     "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>     "org.scalatest" %% "scalatest" % "3.0.1" % "test",
>     "org.apache.spark" %% "spark-core" % "2.1.0" excludeAll(excludeHadoop),
>     "org.apache.spark" %% "spark-mllib" % "2.1.0"
> excludeAll(excludeHadoop),
>     "org.apache.spark" %% "spark-sql" % "2.1.0" excludeAll(excludeHadoop),
>     "org.apache.commons" % "commons-compress" % "1.7",
>     "commons-io" % "commons-io" % "2.4",
>     "org.scalanlp" % "breeze_2.11" % "0.11.2",
>     "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>     "com.github.scopt" %% "scopt" % "3.3.0"
>   )
> }
>
> {
>   val defaultHadoopVersion = "1.0.4"
>   val hadoopVersion =
>     scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
> defaultHadoopVersion)
>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
> hadoopVersion
> }
>
> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0"
>
> resolvers ++= Seq(
>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
> ".m2/repository",
>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases";,
>   "Spray" at "http://repo.spray.cc";
> )
>
> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>   {
>     case PathList("javax", "servlet", xs @ _*)           =>
> MergeStrategy.first
>     case PathList(ps @ _*) if ps.last endsWith ".html"   =>
> MergeStrategy.first
>     case "application.conf"                              =>
> MergeStrategy.concat
>     case "reference.conf"                                =>
> MergeStrategy.concat
>     case "log4j.properties"                              =>
> MergeStrategy.discard
>     case m if m.toLowerCase.endsWith("manifest.mf")      =>
> MergeStrategy.discard
>     case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
> MergeStrategy.discard
>     case _ => MergeStrategy.first
>   }
> }
>
> test in assembly := {}
> ------------------------------------------------------------
> ------------------------------------------------------------
> --------------------------------
>
> On Tue, Mar 28, 2017 at 9:33 PM, Marco Mistroni <mmistr...@gmail.com>
> wrote:
>
>> Hello
>>  that looks to me like there's something dodgy withyour Scala installation
>> Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i
>> suggest you change one thing at a time in your sbt
>> First Spark version. run it and see if it works
>> Then amend the scala version
>>
>> hth
>>  marco
>>
>> On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi <
>> anahita.t.am...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> Thanks you all for your informative answers.
>>> I actually changed the scala version to the 2.11.8 and spark version
>>> into 2.1.0 in the build.sbt
>>>
>>> Except for these two guys (scala and spark version), I kept the same
>>> values for the rest in the build.sbt file.
>>> ------------------------------------------------------------
>>> ---------------
>>> import AssemblyKeys._
>>>
>>> assemblySettings
>>>
>>> name := "proxcocoa"
>>>
>>> version := "0.1"
>>>
>>> scalaVersion := "2.11.8"
>>>
>>> parallelExecution in Test := false
>>>
>>> {
>>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>>   libraryDependencies ++= Seq(
>>>     "org.slf4j" % "slf4j-api" % "1.7.2",
>>>     "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>>>     "org.scalatest" %% "scalatest" % "1.9.1" % "test",
>>>     "org.apache.spark" % "spark-core_2.11" % "2.1.0"
>>> excludeAll(excludeHadoop),
>>>     "org.apache.spark" % "spark-mllib_2.11" % "2.1.0"
>>> excludeAll(excludeHadoop),
>>>     "org.apache.spark" % "spark-sql_2.11" % "2.1.0"
>>> excludeAll(excludeHadoop),
>>>     "org.apache.commons" % "commons-compress" % "1.7",
>>>     "commons-io" % "commons-io" % "2.4",
>>>     "org.scalanlp" % "breeze_2.11" % "0.11.2",
>>>     "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>>>     "com.github.scopt" %% "scopt" % "3.3.0"
>>>   )
>>> }
>>>
>>> {
>>>   val defaultHadoopVersion = "1.0.4"
>>>   val hadoopVersion =
>>>     scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>>> defaultHadoopVersion)
>>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>>> hadoopVersion
>>> }
>>>
>>> libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" %
>>> "2.1.0"
>>>
>>> resolvers ++= Seq(
>>>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>>> ".m2/repository",
>>>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases";,
>>>   "Spray" at "http://repo.spray.cc";
>>> )
>>>
>>> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>>   {
>>>     case PathList("javax", "servlet", xs @ _*)           =>
>>> MergeStrategy.first
>>>     case PathList(ps @ _*) if ps.last endsWith ".html"   =>
>>> MergeStrategy.first
>>>     case "application.conf"                              =>
>>> MergeStrategy.concat
>>>     case "reference.conf"                                =>
>>> MergeStrategy.concat
>>>     case "log4j.properties"                              =>
>>> MergeStrategy.discard
>>>     case m if m.toLowerCase.endsWith("manifest.mf")      =>
>>> MergeStrategy.discard
>>>     case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
>>> MergeStrategy.discard
>>>     case _ => MergeStrategy.first
>>>   }
>>> }
>>>
>>> test in assembly := {}
>>> ----------------------------------------------------------------
>>>
>>> When I compile the code, I get the following error:
>>>
>>> [info] Compiling 4 Scala sources to /Users/atalebi/Desktop/new_ver
>>> sion_proxcocoa-master/target/scala-2.11/classes...
>>> [error] /Users/atalebi/Desktop/new_version_proxcocoa-master/src/main
>>> /scala/utils/OptUtils.scala:40: value mapPartitionsWithSplit is not a
>>> member of org.apache.spark.rdd.RDD[String]
>>> [error]     val sizes = data.mapPartitionsWithSplit{ case(i,lines) =>
>>> [error]                      ^
>>> [error] /Users/atalebi/Desktop/new_version_proxcocoa-master/src/main
>>> /scala/utils/OptUtils.scala:41: value length is not a member of Any
>>> [error]       Iterator(i -> lines.length)
>>> [error]                           ^
>>> ----------------------------------------------------------------
>>> It gets the error in the code. Does it mean that for the different
>>> version of the spark and scala, I need to change the main code?
>>>
>>> Thanks,
>>> Anahita
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Mar 28, 2017 at 10:28 AM, Dinko Srkoč <dinko.sr...@gmail.com>
>>> wrote:
>>>
>>>> Adding to advices given by others ... Spark 2.1.0 works with Scala
>>>> 2.11, so set:
>>>>
>>>>   scalaVersion := "2.11.8"
>>>>
>>>> When you see something like:
>>>>
>>>>   "org.apache.spark" % "spark-core_2.10" % "1.5.2"
>>>>
>>>> that means that library `spark-core` is compiled against Scala 2.10,
>>>> so you would have to change that to 2.11:
>>>>
>>>>   "org.apache.spark" % "spark-core_2.11" % "2.1.0"
>>>>
>>>> better yet, let SBT worry about libraries built against particular
>>>> Scala versions:
>>>>
>>>>   "org.apache.spark" %% "spark-core" % "2.1.0"
>>>>
>>>> The `%%` will instruct SBT to choose the library appropriate for a
>>>> version of Scala that is set in `scalaVersion`.
>>>>
>>>> It may be worth mentioning that the `%%` thing works only with Scala
>>>> libraries as they are compiled against a certain Scala version. Java
>>>> libraries are unaffected (have nothing to do with Scala), e.g. for
>>>> `slf4j` one only uses single `%`s:
>>>>
>>>>   "org.slf4j" % "slf4j-api" % "1.7.2"
>>>>
>>>> Cheers,
>>>> Dinko
>>>>
>>>> On 27 March 2017 at 23:30, Mich Talebzadeh <mich.talebza...@gmail.com>
>>>> wrote:
>>>> > check these versions
>>>> >
>>>> > function create_build_sbt_file {
>>>> >         BUILD_SBT_FILE=${GEN_APPSDIR}/scala/${APPLICATION}/build.sbt
>>>> >         [ -f ${BUILD_SBT_FILE} ] && rm -f ${BUILD_SBT_FILE}
>>>> >         cat >> $BUILD_SBT_FILE << !
>>>> > lazy val root = (project in file(".")).
>>>> >   settings(
>>>> >     name := "${APPLICATION}",
>>>> >     version := "1.0",
>>>> >     scalaVersion := "2.11.8",
>>>> >     mainClass in Compile := Some("myPackage.${APPLICATION}")
>>>> >   )
>>>> > libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" %
>>>> > "provided"
>>>> > libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" %
>>>> > "provided"
>>>> > libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.0.0" %
>>>> > "provided"
>>>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" %
>>>> "2.0.0" %
>>>> > "provided"
>>>> > libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka" %
>>>> > "1.6.1" % "provided"
>>>> > libraryDependencies += "com.google.code.gson" % "gson" % "2.6.2"
>>>> > libraryDependencies += "org.apache.phoenix" % "phoenix-spark" %
>>>> > "4.6.0-HBase-1.0"
>>>> > libraryDependencies += "org.apache.hbase" % "hbase" % "1.2.3"
>>>> > libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.2.3"
>>>> > libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.2.3"
>>>> > libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.2.3"
>>>> > // META-INF discarding
>>>> > mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>>> >    {
>>>> >     case PathList("META-INF", xs @ _*) => MergeStrategy.discard
>>>> >     case x => MergeStrategy.first
>>>> >    }
>>>> > }
>>>> > !
>>>> > }
>>>> >
>>>> > HTH
>>>> >
>>>> > Dr Mich Talebzadeh
>>>> >
>>>> >
>>>> >
>>>> > LinkedIn
>>>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJ
>>>> d6zP6AcPCCdOABUrV8Pw
>>>> >
>>>> >
>>>> >
>>>> > http://talebzadehmich.wordpress.com
>>>> >
>>>> >
>>>> > Disclaimer: Use it at your own risk. Any and all responsibility for
>>>> any
>>>> > loss, damage or destruction of data or any other property which may
>>>> arise
>>>> > from relying on this email's technical content is explicitly
>>>> disclaimed. The
>>>> > author will in no case be liable for any monetary damages arising
>>>> from such
>>>> > loss, damage or destruction.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On 27 March 2017 at 21:45, Jörn Franke <jornfra...@gmail.com> wrote:
>>>> >>
>>>> >> Usually you define the dependencies to the Spark library as
>>>> provided. You
>>>> >> also seem to mix different Spark versions which should be avoided.
>>>> >> The Hadoop library seems to be outdated and should also only be
>>>> provided.
>>>> >>
>>>> >> The other dependencies you could assemble in a fat jar.
>>>> >>
>>>> >> On 27 Mar 2017, at 21:25, Anahita Talebi <anahita.t.am...@gmail.com>
>>>> >> wrote:
>>>> >>
>>>> >> Hi friends,
>>>> >>
>>>> >> I have a code which is written in Scala. The scala version 2.10.4 and
>>>> >> Spark version 1.5.2 are used to run the code.
>>>> >>
>>>> >> I would like to upgrade the code to the most updated version of
>>>> spark,
>>>> >> meaning 2.1.0.
>>>> >>
>>>> >> Here is the build.sbt:
>>>> >>
>>>> >> import AssemblyKeys._
>>>> >>
>>>> >> assemblySettings
>>>> >>
>>>> >> name := "proxcocoa"
>>>> >>
>>>> >> version := "0.1"
>>>> >>
>>>> >> scalaVersion := "2.10.4"
>>>> >>
>>>> >> parallelExecution in Test := false
>>>> >>
>>>> >> {
>>>> >>   val excludeHadoop = ExclusionRule(organization =
>>>> "org.apache.hadoop")
>>>> >>   libraryDependencies ++= Seq(
>>>> >>     "org.slf4j" % "slf4j-api" % "1.7.2",
>>>> >>     "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>>>> >>     "org.scalatest" %% "scalatest" % "1.9.1" % "test",
>>>> >>     "org.apache.spark" % "spark-core_2.10" % "1.5.2"
>>>> >> excludeAll(excludeHadoop),
>>>> >>     "org.apache.spark" % "spark-mllib_2.10" % "1.5.2"
>>>> >> excludeAll(excludeHadoop),
>>>> >>     "org.apache.spark" % "spark-sql_2.10" % "1.5.2"
>>>> >> excludeAll(excludeHadoop),
>>>> >>     "org.apache.commons" % "commons-compress" % "1.7",
>>>> >>     "commons-io" % "commons-io" % "2.4",
>>>> >>     "org.scalanlp" % "breeze_2.10" % "0.11.2",
>>>> >>     "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>>>> >>     "com.github.scopt" %% "scopt" % "3.3.0"
>>>> >>   )
>>>> >> }
>>>> >>
>>>> >> {
>>>> >>   val defaultHadoopVersion = "1.0.4"
>>>> >>   val hadoopVersion =
>>>> >>     scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>>>> >> defaultHadoopVersion)
>>>> >>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>>>> >> hadoopVersion
>>>> >> }
>>>> >>
>>>> >> libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" %
>>>> >> "1.5.0"
>>>> >>
>>>> >> resolvers ++= Seq(
>>>> >>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>>>> >> ".m2/repository",
>>>> >>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases";,
>>>> >>   "Spray" at "http://repo.spray.cc";
>>>> >> )
>>>> >>
>>>> >> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>>> >>   {
>>>> >>     case PathList("javax", "servlet", xs @ _*)           =>
>>>> >> MergeStrategy.first
>>>> >>     case PathList(ps @ _*) if ps.last endsWith ".html"   =>
>>>> >> MergeStrategy.first
>>>> >>     case "application.conf"                              =>
>>>> >> MergeStrategy.concat
>>>> >>     case "reference.conf"                                =>
>>>> >> MergeStrategy.concat
>>>> >>     case "log4j.properties"                              =>
>>>> >> MergeStrategy.discard
>>>> >>     case m if m.toLowerCase.endsWith("manifest.mf")      =>
>>>> >> MergeStrategy.discard
>>>> >>     case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
>>>> >> MergeStrategy.discard
>>>> >>     case _ => MergeStrategy.first
>>>> >>   }
>>>> >> }
>>>> >>
>>>> >> test in assembly := {}
>>>> >>
>>>> >> -----------------------------------------------------------
>>>> >> I downloaded the spark 2.1.0 and change the version of spark and
>>>> >> scalaversion in the build.sbt. But unfortunately, I was failed to
>>>> run the
>>>> >> code.
>>>> >>
>>>> >> Does anybody know how I can upgrade the code to the most recent spark
>>>> >> version by changing the build.sbt file?
>>>> >>
>>>> >> Or do you have any other suggestion?
>>>> >>
>>>> >> Thanks a lot,
>>>> >> Anahita
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
>

Re: Upgrade the scala code using the most updated Spark version

Reply via email to