This is all very helpful. I'm weighing 3 ideas... 1. Take Nelson up on his offer and name it something like Datawarehouse::ETL::Script. 2. Create a top-level namespace for ETL. 3. Make a new namespace like StructuredDataConverter.
Idea 1 ends up with a tree like this... * Datawarehouse::ETL::Script * Datawarehouse::ETL::Script::Extract * Datawarehouse::ETL::Script::Extract::Excel * Datawarehouse::ETL::Script::Extract::DelimitedText * Datawarehouse::ETL::Script::Extract::XML * Datawarehouse::ETL::Script::Load * Datawarehouse::ETL::Script::Load::MSAccess Idea 2 looks like so... * ETL * ETL::Extract * ETL::Extract::Excel * ETL::Extract::DelimitedText * ETL::Extract::XML * ETL::Load * ETL::Load::MSAccess And 3 like this... * StructuredDataConverter * StructuredDataConverter::Extract * StructuredDataConverter::Extract::Excel * StructuredDataConverter::Extract::DelimitedText * StructuredDataConverter::Extract::XML * StructuredDataConverter::Load * StructuredDataConverter::Load::MSAccess #2 is short and sweet. On the downside, acronyms are context sensitive. #3 is long, but unique and describes the functionality. #1 makes use of an existing namespace. Here's an example usage... use ETL; # Exports the functions used below working_folder 'C:\Data'; # Open an Excel file named "C:\Data\Input.xlsx" extract_from 'Excel', find_file => 'Input.xlsx'; # Put column A into the "Name" field, and B into "Address" transform_as Name => 'A', Address => 'B'; # Output goes into the "person" table in an Access database load_into 'MSAccess', table => 'person'; # This is a "for" loop that reads each input and writes the output run; I like 2 for its simplicity. "use ETL;" has a nice succinct quality. -- Robert W. On Tue, May 3, 2016 at 6:36 PM Jed Lund <jandrewl...@gmail.com> wrote: > I don't know what the general feeling is but I've always felt that there > should be an ETL Top level module namespace. ( if you don't count > practical extraction and reporting language :) The issue is, there doesn't > appear to be very good community consensus on best practices for ETL > behavior or methods. I suspect the variation in that namespace early on > might be distracting? Or maybe if you build it they will come? > > I notice that you have Extract and Load covered in your proposal. Do you > also have transform and logging on the way? > > Best Regards, > > Jed (JANDREW <https://metacpan.org/author/JANDREW>) > > On Tue, May 3, 2016 at 2:23 AM, Nelson Ferraz <nfer...@gmail.com> wrote: > >> I'm the maintainer of the DataWarehouse::* modules. >> >> Let me know if you would like to use the DataWarehouse::ETL namespace. >> >> >> >> On Tue, May 3, 2016 at 10:36 AM, Smylers <smyl...@stripey.com> wrote: >> >>> Robert Wohlfarth writes: >>> >>> > I am looking to release a collection of modules for converting data. >>> > The modules read data from a source, convert the data, then add it >>> > into an SQL database. >>> > >>> > The modules are named like this... >>> > * Data::ETL >>> > * Data::ETL::Extract >>> > * Data::ETL::Extract::Excel >>> > * Data::ETL::Extract::DelimitedText >>> > * Data::ETL::Extract::XML >>> > * Data::ETL::Load >>> > * Data::ETL::MSAccess >>> > >>> > In my mind, ETL means "Extract-Transform-Load". >>> >>> That wouldn't've occurred to me, but the Wikipedia page for ‘Extra, >>> transform, load’ is the top link when searching DuckDuckGo for “ETL”, so >>> it seems reasonable to use it in a module name if your target audience >>> is people already working in the field and familiar with its jargon. >>> >>> > Is "Data" an appropriate place? >>> >>> Yes ... and no. Data:: is appropriate for pretty much every module on >>> Cpan, in that an awful lot of code does stuff with data. That makes it a >>> suboptimal namespace, because it doesn't define what's specific about >>> this particular module. >>> >>> In particular, it didn't to me suggest databases, or even data >>> warehousing (which the ETL Wikipedia page suggests is the main use of >>> ETL). It'd be good for the name to indicate that field in some way. >>> >>> > Thoughts on the naming convention "Data::ETL"? >>> >>> The combination of a very broad namespace and an acronym makes it hard >>> to guess at the area of the module — for instance that would be an >>> equally good name for a module that processes data searching for >>> extra-terrestrial life ... >>> >>> If the database-loading part uses DBI connections then the DBIx:: >>> namespace would be good for indicating that. >>> >>> Unfortunately for you, DataWarehouse::ETL is already used by another >>> module. Ideally you'd mention that module in your docs, explaining to >>> new users the difference between them. If your name can help to indicate >>> the distinctive feature of yours, so much the better — but often that >>> isn't possible if they are simply different approaches to the same >>> problem. >>> >>> One possibility for a suite of connected modules that only really work >>> together is to concoct a ‘fanciful’ brand name for the framework, like >>> Moose or Catalyst and put all your modules under either $Brand:: or >>> something like DataWarehouse::$Brand::. >>> >>> A framework name works well if, say, your $whatever::Extract::Excel >>> module is only intended to be used with other modules in your framework >>> and doesn't really make sense as a standalone module for somebody just >>> wanting to extract data from an Excel spreadsheet (and get back a Perl >>> data structure they can do what they want with). The brand name >>> indicates that it's part of the framework and to be used with that. >>> >>> Hope that helps. >>> >>> Smylers >>> -- >>> http://twitter.com/Smylers2 >>> >> >> >> >> -- >> Nelson Ferraz >> > > -- Robert Wohlfarth