This is all very helpful. I'm weighing 3 ideas...
1. Take Nelson up on his offer and name it something like
Datawarehouse::ETL::Script.
2. Create a top-level namespace for ETL.
3. Make a new namespace like StructuredDataConverter.

Idea 1 ends up with a tree like this...
* Datawarehouse::ETL::Script
* Datawarehouse::ETL::Script::Extract
* Datawarehouse::ETL::Script::Extract::Excel
* Datawarehouse::ETL::Script::Extract::DelimitedText
* Datawarehouse::ETL::Script::Extract::XML
* Datawarehouse::ETL::Script::Load
* Datawarehouse::ETL::Script::Load::MSAccess

Idea 2 looks like so...
* ETL
* ETL::Extract
* ETL::Extract::Excel
* ETL::Extract::DelimitedText
* ETL::Extract::XML
* ETL::Load
* ETL::Load::MSAccess

And 3 like this...
* StructuredDataConverter
* StructuredDataConverter::Extract
* StructuredDataConverter::Extract::Excel
* StructuredDataConverter::Extract::DelimitedText
* StructuredDataConverter::Extract::XML
* StructuredDataConverter::Load
* StructuredDataConverter::Load::MSAccess

#2 is short and sweet. On the downside, acronyms are context sensitive. #3
is long, but unique and describes the functionality. #1 makes use of an
existing namespace.

Here's an example usage...
    use ETL;  # Exports the functions used below
    working_folder 'C:\Data';
    # Open an Excel file named "C:\Data\Input.xlsx"
    extract_from 'Excel', find_file => 'Input.xlsx';
    # Put column A into the "Name" field, and B into "Address"
    transform_as Name => 'A', Address => 'B';
    # Output goes into the "person" table in an Access database
    load_into 'MSAccess', table => 'person';
    # This is a "for" loop that reads each input and writes the output
    run;

I like 2 for its simplicity. "use ETL;" has a nice succinct quality.

--
Robert W.


On Tue, May 3, 2016 at 6:36 PM Jed Lund <jandrewl...@gmail.com> wrote:

> I don't know what the general feeling is but I've always felt that there
> should be an ETL Top level module namespace.  ( if you don't count
> practical extraction and reporting language :)  The issue is, there doesn't
> appear to be very good community consensus on best practices for ETL
> behavior or methods.   I suspect the variation in that namespace early on
> might be distracting?  Or maybe if you build it they will come?
>
> I notice that you have Extract and Load covered in your proposal.  Do you
> also have transform and logging on the way?
>
> Best Regards,
>
> Jed (JANDREW <https://metacpan.org/author/JANDREW>)
>
> On Tue, May 3, 2016 at 2:23 AM, Nelson Ferraz <nfer...@gmail.com> wrote:
>
>> I'm the maintainer of the DataWarehouse::* modules.
>>
>> Let me know if you would like to use the DataWarehouse::ETL namespace.
>>
>>
>>
>> On Tue, May 3, 2016 at 10:36 AM, Smylers <smyl...@stripey.com> wrote:
>>
>>> Robert Wohlfarth writes:
>>>
>>> > I am looking to release a collection of modules for converting data.
>>> > The modules read data from a source, convert the data, then add it
>>> > into an SQL database.
>>> >
>>> > The modules are named like this...
>>> > * Data::ETL
>>> > * Data::ETL::Extract
>>> > * Data::ETL::Extract::Excel
>>> > * Data::ETL::Extract::DelimitedText
>>> > * Data::ETL::Extract::XML
>>> > * Data::ETL::Load
>>> > * Data::ETL::MSAccess
>>> >
>>> > In my mind, ETL means "Extract-Transform-Load".
>>>
>>> That wouldn't've occurred to me, but the Wikipedia page for ‘Extra,
>>> transform, load’ is the top link when searching DuckDuckGo for “ETL”, so
>>> it seems reasonable to use it in a module name if your target audience
>>> is people already working in the field and familiar with its jargon.
>>>
>>> > Is "Data" an appropriate place?
>>>
>>> Yes ... and no. Data:: is appropriate for pretty much every module on
>>> Cpan, in that an awful lot of code does stuff with data. That makes it a
>>> suboptimal namespace, because it doesn't define what's specific about
>>> this particular module.
>>>
>>> In particular, it didn't to me suggest databases, or even data
>>> warehousing (which the ETL Wikipedia page suggests is the main use of
>>> ETL). It'd be good for the name to indicate that field in some way.
>>>
>>> > Thoughts on the naming convention "Data::ETL"?
>>>
>>> The combination of a very broad namespace and an acronym makes it hard
>>> to guess at the area of the module — for instance that would be an
>>> equally good name for a module that processes data searching for
>>> extra-terrestrial life ...
>>>
>>> If the database-loading part uses DBI connections then the DBIx::
>>> namespace would be good for indicating that.
>>>
>>> Unfortunately for you, DataWarehouse::ETL is already used by another
>>> module. Ideally you'd mention that module in your docs, explaining to
>>> new users the difference between them. If your name can help to indicate
>>> the distinctive feature of yours, so much the better — but often that
>>> isn't possible if they are simply different approaches to the same
>>> problem.
>>>
>>> One possibility for a suite of connected modules that only really work
>>> together is to concoct a ‘fanciful’ brand name for the framework, like
>>> Moose or Catalyst and put all your modules under either $Brand:: or
>>> something like DataWarehouse::$Brand::.
>>>
>>> A framework name works well if, say, your $whatever::Extract::Excel
>>> module is only intended to be used with other modules in your framework
>>> and doesn't really make sense as a standalone module for somebody just
>>> wanting to extract data from an Excel spreadsheet (and get back a Perl
>>> data structure they can do what they want with). The brand name
>>> indicates that it's part of the framework and to be used with that.
>>>
>>> Hope that helps.
>>>
>>> Smylers
>>> --
>>> http://twitter.com/Smylers2
>>>
>>
>>
>>
>> --
>> Nelson Ferraz
>>
>
> --
Robert Wohlfarth

Reply via email to