It only makes sense if the underlying file is also splittable, and even
then, it doesn't really do anything for you if you don't explicitly tell
spark about the split boundaries
On Tue, Jan 14, 2020 at 7:36 PM Someshwar Kale wrote:
> I would suggest to use other compression technique which is
I would suggest to use other compression technique which is splittable for
eg. Bzip2, lzo, lz4.
On Wed, Jan 15, 2020, 1:32 AM Enrico Minack wrote:
> Hi,
>
> Spark does not support 7z natively, but you can read any file in Spark:
>
> def read(stream: PortableDataStream): Iterator[String] = {
>
Hi,
Spark does not support 7z natively, but you can read any file in Spark:
def read(stream: PortableDataStream):Iterator[String] =
{Seq(stream.getPath()).iterator }
spark.sparkContext
.binaryFiles("*.7z")
.flatMap(file => read(file._2))
.toDF("path")
.show(false)
This scales with
Hi,
Is it possible to read 7z compressed file in spark?
Kind Regards
Harsh Takkar