I am working on a ongoing project with email where I will need to import daily 
csv files into perl and create a searchable database of all the files.

Here is some example data:

Header:
(Some of these fields\columns may or may not be removed in future csv's, but 
this is what we have for now)
Timestamp,SenderFromDomain,SenderFromAddress,DMARC,RecipientEmailAddress,Subject,SenderIPv4,Connectors,DeliveryAction,EmailActionPolicy,OrgLevelAction,OrgLevelPolicy,UserLevelAction,UserLevelPolicy,AuthenticationDetails,Context,ReportId,SenderObjectId


Example rows:

"Jan 27, 2026 3:30:56 
PM",domain.com,[email protected],pass,[email protected],Thank you for your 
application,20.1.130.13,,Delivered,,Allow,Connection 
policy,,,"{""SPF"":""pass"",""DKIM"":""pass"",""DMARC"":""pass"",""CompAuth"":""pass""}",,4647d63d-1f9d-4982-6c39-08de5de2f778-18193297287602271192-1,1d3478ee-351f-4ee9-b6ec-7b03ee68e334

"Jan 27, 2026 3:33:04 PM", domain.ar,notifica@ 
domain.ar,pass,[email protected],EnvĂ­o de Orden de Compra Aramark Nro. 
115615,149.72.150.13,,Delivered,,,,,,"{""SPF"":""pass"",""DKIM"":""pass"",""DMARC"":""pass"",""CompAuth"":""pass""}",,976717e0-23ac-4538-a058-08de5de33a88-6451908357547151849-1,

"Jan 27, 2026 3:31:29 PM", domain.com,paradox@ 
domain.com,pass,[email protected],Please confirm your interview with HR 
Reps,159.183.2.108,,Delivered,,,,,,"{""SPF"":""pass"",""DKIM"":""pass"",""DMARC"":""pass"",""CompAuth"":""pass""}",,f8d7f41e-fb08-491c-43f4-08de5de30c16-11061410221252786783-1,5767d814-45d6-4a03-bb3b-434692b8edc3




My initial question is:

Since the data will stay for some time (at least a year), is a database the 
best to import the data "into"? Or would an array be a better approach?

Some of the queries I expect to perfom are:

"Show me the last time that a specific value in SenderFromAddress had a 
Connector value of "empty""

"Show me the last time that SenderFromAddress had a OrgLevelPolicy value of 
"xyz""

Things like that. Basically query any combinations of fields



Also, since all the files are in the same format, how do you "ignore" the 
header after the "first import"?


Also, there is a potential for some overlap in data, albeit small (I am pulling 
this data from a KQL query in O365), is there a "routine" I can run against the 
data to detect and remove any duplicate data.
I would like to learn how to do this both during the import and also run it 
against existing data. That may seem "extra" but this is all about me learning 
how to do each of these things

Is this a good starting place for what I am looking to do:?

How to read a CSV file using 
Perl?<https://perlmaven.com/how-to-read-a-csv-file-using-perl>


Thank you,

Rich




Reply via email to