Rafael Mendes, Are you from ?
Thanks. > On 21 Feb 2022, at 15:33, Danilo Sousa <danilosousa...@gmail.com> wrote: > > Yes, this a only single file. > > Thanks Rafael Mendes. > >> On 13 Feb 2022, at 07:13, Rafael Mendes <rafaelpir...@gmail.com >> <mailto:rafaelpir...@gmail.com>> wrote: >> >> Hi, Danilo. >> Do you have a single large file, only? >> If so, I guess you can use tools like sed/awk to split it into more files >> based on layout, so you can read these files into Spark. >> >> >> Em qua, 9 de fev de 2022 09:30, Bitfox <bit...@bitfox.top >> <mailto:bit...@bitfox.top>> escreveu: >> Hi >> >> I am not sure about the total situation. >> But if you want a scala integration I think it could use regex to match and >> capture the keywords. >> Here I wrote one you can modify by your end. >> >> import scala.io.Source >> import scala.collection.mutable.ArrayBuffer >> >> val list1 = ArrayBuffer[(String,String,String)]() >> val list2 = ArrayBuffer[(String,String)]() >> >> >> val patt1 = """^(.*)#(.*)#([^#]*)$""".r >> val patt2 = """^(.*)#([^#]*)$""".r >> >> val file = "1.txt" >> val lines = Source.fromFile(file).getLines() >> >> for ( x <- lines ) { >> x match { >> case patt1(k,v,z) => list1 += ((k,v,z)) >> case patt2(k,v) => list2 += ((k,v)) >> case _ => println("no match") >> } >> } >> >> >> Now the list1 and list2 have the elements you wanted, you can convert them >> to a dataframe easily. >> >> Thanks. >> >> On Wed, Feb 9, 2022 at 7:20 PM Danilo Sousa <danilosousa...@gmail.com >> <mailto:danilosousa...@gmail.com>> wrote: >> Hello >> >> >> Yes, for this block I can open as csv with # delimiter, but have the block >> that is no csv format. >> >> This is the likely key value. >> >> We have two different layouts in the same file. This is the “problem”. >> >> Thanks for your time. >> >> >> >>> Relação de Beneficiários Ativos e Excluídos >>> Carteira em#27/12/2019##Todos os Beneficiários >>> Operadora#AMIL >>> Filial#SÃO PAULO#Unidade#Guarulhos >>> >>> Contrato#123456 - Test >>> Empresa#Test >> >>> On 9 Feb 2022, at 00:58, Bitfox <bit...@bitfox.top >>> <mailto:bit...@bitfox.top>> wrote: >>> >>> Hello >>> >>> You can treat it as a csf file and load it from spark: >>> >>> >>> df = spark.read.format("csv").option("inferSchema", >>> >>> "true").option("header", "true").option("sep","#").load(csv_file) >>> >>> df.show() >>> +--------------------+-------------------+-----------------+ >>> | Plano|Código Beneficiário|Nome Beneficiário| >>> +--------------------+-------------------+-----------------+ >>> |58693 - NACIONAL ...| 65751353| Jose Silva| >>> |58693 - NACIONAL ...| 65751388| Joana Silva| >>> |58693 - NACIONAL ...| 65751353| Felipe Silva| >>> |58693 - NACIONAL ...| 65751388| Julia Silva| >>> +--------------------+-------------------+-----------------+ >>> >>> >>> cat csv_file: >>> >>> Plano#Código Beneficiário#Nome Beneficiário >>> 58693 - NACIONAL R COPART PJCE#065751353#Jose Silva >>> 58693 - NACIONAL R COPART PJCE#065751388#Joana Silva >>> 58693 - NACIONAL R COPART PJCE#065751353#Felipe Silva >>> 58693 - NACIONAL R COPART PJCE#065751388#Julia Silva >>> >>> >>> Regards >>> >>> >>> On Wed, Feb 9, 2022 at 12:50 AM Danilo Sousa <danilosousa...@gmail.com >>> <mailto:danilosousa...@gmail.com>> wrote: >>> Hi >>> I have to transform unstructured text to dataframe. >>> Could anyone please help with Scala code ? >>> >>> Dataframe need as: >>> >>> operadora filial unidade contrato empresa plano codigo_beneficiario >>> nome_beneficiario >>> >>> Relação de Beneficiários Ativos e Excluídos >>> Carteira em#27/12/2019##Todos os Beneficiários >>> Operadora#AMIL >>> Filial#SÃO PAULO#Unidade#Guarulhos >>> >>> Contrato#123456 - Test >>> Empresa#Test >>> Plano#Código Beneficiário#Nome Beneficiário >>> 58693 - NACIONAL R COPART PJCE#073930312#Joao Silva >>> 58693 - NACIONAL R COPART PJCE#073930313#Maria Silva >>> >>> Contrato#898011000 - FUNDACAO GERDAU >>> Empresa#FUNDACAO GERDAU >>> Plano#Código Beneficiário#Nome Beneficiário >>> 58693 - NACIONAL R COPART PJCE#065751353#Jose Silva >>> 58693 - NACIONAL R COPART PJCE#065751388#Joana Silva >>> 58693 - NACIONAL R COPART PJCE#065751353#Felipe Silva >>> 58693 - NACIONAL R COPART PJCE#065751388#Julia Silva >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> <mailto:user-unsubscr...@spark.apache.org> >>> >> >