Rafael Mendes,

Are you from ?

Thanks.
> On 21 Feb 2022, at 15:33, Danilo Sousa <danilosousa...@gmail.com> wrote:
> 
> Yes, this a only single file.
> 
> Thanks Rafael Mendes.
> 
>> On 13 Feb 2022, at 07:13, Rafael Mendes <rafaelpir...@gmail.com 
>> <mailto:rafaelpir...@gmail.com>> wrote:
>> 
>> Hi, Danilo.
>> Do you have a single large file, only?
>> If so, I guess you can use tools like sed/awk to split it into more files 
>> based on layout, so you can read these files into Spark.
>> 
>> 
>> Em qua, 9 de fev de 2022 09:30, Bitfox <bit...@bitfox.top 
>> <mailto:bit...@bitfox.top>> escreveu:
>> Hi
>> 
>> I am not sure about the total situation.
>> But if you want a scala integration I think it could use regex to match and 
>> capture the keywords.
>> Here I wrote one you can modify by your end.
>> 
>> import scala.io.Source
>> import scala.collection.mutable.ArrayBuffer
>> 
>> val list1 = ArrayBuffer[(String,String,String)]()
>> val list2 = ArrayBuffer[(String,String)]()
>> 
>> 
>> val patt1 = """^(.*)#(.*)#([^#]*)$""".r
>> val patt2 = """^(.*)#([^#]*)$""".r
>> 
>> val file = "1.txt"
>> val lines = Source.fromFile(file).getLines()
>> 
>> for ( x <- lines ) {
>>   x match {
>>     case patt1(k,v,z) => list1 += ((k,v,z))
>>     case patt2(k,v) => list2 += ((k,v))
>>     case _ => println("no match")
>>   }
>> }
>> 
>> 
>> Now the list1 and list2 have the elements you wanted, you can convert them 
>> to a dataframe easily.
>> 
>> Thanks.
>> 
>> On Wed, Feb 9, 2022 at 7:20 PM Danilo Sousa <danilosousa...@gmail.com 
>> <mailto:danilosousa...@gmail.com>> wrote:
>> Hello
>> 
>> 
>> Yes, for this block I can open as csv with # delimiter, but have the block 
>> that is no csv format. 
>> 
>> This is the likely key value. 
>> 
>> We have two different layouts in the same file. This is the “problem”.
>> 
>> Thanks for your time.
>> 
>> 
>> 
>>> Relação de Beneficiários Ativos e Excluídos
>>> Carteira em#27/12/2019##Todos os Beneficiários
>>> Operadora#AMIL
>>> Filial#SÃO PAULO#Unidade#Guarulhos
>>> 
>>> Contrato#123456 - Test
>>> Empresa#Test
>> 
>>> On 9 Feb 2022, at 00:58, Bitfox <bit...@bitfox.top 
>>> <mailto:bit...@bitfox.top>> wrote:
>>> 
>>> Hello
>>> 
>>> You can treat it as a csf file and load it from spark:
>>> 
>>> >>> df = spark.read.format("csv").option("inferSchema", 
>>> >>> "true").option("header", "true").option("sep","#").load(csv_file)
>>> >>> df.show()
>>> +--------------------+-------------------+-----------------+
>>> |               Plano|Código Beneficiário|Nome Beneficiário|
>>> +--------------------+-------------------+-----------------+
>>> |58693 - NACIONAL ...|           65751353|       Jose Silva|
>>> |58693 - NACIONAL ...|           65751388|      Joana Silva|
>>> |58693 - NACIONAL ...|           65751353|     Felipe Silva|
>>> |58693 - NACIONAL ...|           65751388|      Julia Silva|
>>> +--------------------+-------------------+-----------------+
>>> 
>>> 
>>> cat csv_file:
>>> 
>>> Plano#Código Beneficiário#Nome Beneficiário
>>> 58693 - NACIONAL R COPART PJCE#065751353#Jose Silva
>>> 58693 - NACIONAL R COPART PJCE#065751388#Joana Silva
>>> 58693 - NACIONAL R COPART PJCE#065751353#Felipe Silva
>>> 58693 - NACIONAL R COPART PJCE#065751388#Julia Silva
>>> 
>>> 
>>> Regards
>>> 
>>> 
>>> On Wed, Feb 9, 2022 at 12:50 AM Danilo Sousa <danilosousa...@gmail.com 
>>> <mailto:danilosousa...@gmail.com>> wrote:
>>> Hi
>>> I have to transform unstructured text to dataframe.
>>> Could anyone please help with Scala code ?
>>> 
>>> Dataframe need as:
>>> 
>>> operadora filial unidade contrato empresa plano codigo_beneficiario 
>>> nome_beneficiario
>>> 
>>> Relação de Beneficiários Ativos e Excluídos
>>> Carteira em#27/12/2019##Todos os Beneficiários
>>> Operadora#AMIL
>>> Filial#SÃO PAULO#Unidade#Guarulhos
>>> 
>>> Contrato#123456 - Test
>>> Empresa#Test
>>> Plano#Código Beneficiário#Nome Beneficiário
>>> 58693 - NACIONAL R COPART PJCE#073930312#Joao Silva
>>> 58693 - NACIONAL R COPART PJCE#073930313#Maria Silva
>>> 
>>> Contrato#898011000 - FUNDACAO GERDAU
>>> Empresa#FUNDACAO GERDAU
>>> Plano#Código Beneficiário#Nome Beneficiário
>>> 58693 - NACIONAL R COPART PJCE#065751353#Jose Silva
>>> 58693 - NACIONAL R COPART PJCE#065751388#Joana Silva
>>> 58693 - NACIONAL R COPART PJCE#065751353#Felipe Silva
>>> 58693 - NACIONAL R COPART PJCE#065751388#Julia Silva
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
>>> <mailto:user-unsubscr...@spark.apache.org>
>>> 
>> 
> 

Reply via email to