I am working on a ongoing project with email where I will need to import daily csv files into perl and create a searchable database of all the files.
Here is some example data: Header: (Some of these fields\columns may or may not be removed in future csv's, but this is what we have for now) Timestamp,SenderFromDomain,SenderFromAddress,DMARC,RecipientEmailAddress,Subject,SenderIPv4,Connectors,DeliveryAction,EmailActionPolicy,OrgLevelAction,OrgLevelPolicy,UserLevelAction,UserLevelPolicy,AuthenticationDetails,Context,ReportId,SenderObjectId Example rows: "Jan 27, 2026 3:30:56 PM",domain.com,[email protected],pass,[email protected],Thank you for your application,20.1.130.13,,Delivered,,Allow,Connection policy,,,"{""SPF"":""pass"",""DKIM"":""pass"",""DMARC"":""pass"",""CompAuth"":""pass""}",,4647d63d-1f9d-4982-6c39-08de5de2f778-18193297287602271192-1,1d3478ee-351f-4ee9-b6ec-7b03ee68e334 "Jan 27, 2026 3:33:04 PM", domain.ar,notifica@ domain.ar,pass,[email protected],EnvĂo de Orden de Compra Aramark Nro. 115615,149.72.150.13,,Delivered,,,,,,"{""SPF"":""pass"",""DKIM"":""pass"",""DMARC"":""pass"",""CompAuth"":""pass""}",,976717e0-23ac-4538-a058-08de5de33a88-6451908357547151849-1, "Jan 27, 2026 3:31:29 PM", domain.com,paradox@ domain.com,pass,[email protected],Please confirm your interview with HR Reps,159.183.2.108,,Delivered,,,,,,"{""SPF"":""pass"",""DKIM"":""pass"",""DMARC"":""pass"",""CompAuth"":""pass""}",,f8d7f41e-fb08-491c-43f4-08de5de30c16-11061410221252786783-1,5767d814-45d6-4a03-bb3b-434692b8edc3 My initial question is: Since the data will stay for some time (at least a year), is a database the best to import the data "into"? Or would an array be a better approach? Some of the queries I expect to perfom are: "Show me the last time that a specific value in SenderFromAddress had a Connector value of "empty"" "Show me the last time that SenderFromAddress had a OrgLevelPolicy value of "xyz"" Things like that. Basically query any combinations of fields Also, since all the files are in the same format, how do you "ignore" the header after the "first import"? Also, there is a potential for some overlap in data, albeit small (I am pulling this data from a KQL query in O365), is there a "routine" I can run against the data to detect and remove any duplicate data. I would like to learn how to do this both during the import and also run it against existing data. That may seem "extra" but this is all about me learning how to do each of these things Is this a good starting place for what I am looking to do:? How to read a CSV file using Perl?<https://perlmaven.com/how-to-read-a-csv-file-using-perl> Thank you, Rich
