Agreed on all. Except why should dots in column names be any different
than schema and table names?

2013/8/16 Hans Drexler <[email protected]>:
> I believe that probably, *every* convention will have its drawbacks. using a 
> factory can help on one hand, but it can also cause great confusion if things 
> get mixed. It also makes things more complex. If we clearly document the 
> choice made, I will live with that.
>
> My main point is that  we should try to write and document the software in 
> such way that MetaModel users will not get confused. I like the quotes idea, 
> since that will allow the user to explicitely express what is intended. But 
> then, lets extend it to something like this:
>
> "schema_name"."table_name"."column_name"
>
> Where schema_name and table_name can contain dots (".").   (I guess column 
> names cannot...)
>
> I hope you don't mind me rambling about this...
>
> kind regards,
>
> Hans
>
> -----Original Message-----
> From: Kasper Sørensen [mailto:[email protected]]
> Sent: Wednesday, August 14, 2013 2:59 PM
> To: [email protected]
> Subject: Re: [DISCUSS] use folder name as schema name for file based 
> DataContexts
>
> With those different preferences, we could even consider making something 
> like a "TableNameFactory" which converts filenames into table names. But I 
> guess the crucial point is which default convention to use.
>
> Underscoring makes it a bit cleaner to look at the column or table paths, but 
> it also makes the representation less direct. A user could start wondering if 
> there are other characters than dots that will be replaced by underscores etc.
>
> It should be noted that MM's parser does support dots in both table and 
> schema names, so this is probably mostly a question of aesthetics.
>
> The ambiguity that you point out is also interesting. So far I haven't seen 
> it appear in real life, but technically it could occur that you had two pairs 
> of schemas and tables that would generate a ambigious table path. For 
> instance:
>
> Schema: foo.bar
> Table: baz
>
> and
>
> Schema: foo
> Table: bar.baz
>
> The parser would currently favor the second schema ("foo") since it 
> incrementally tries for schema/table/column matches with every dot-separated 
> token. An improvement to the parser would be to allow quote characters, so 
> that you could express your table path like this
> then:
>
> "foo.bar".baz
>
> Also I want to note that some databases do support dots in 
> schema/table/column names, so this ambiguity can (although rarely) also occur 
> in a RDBMS or other data sources. It would also be quite common with some 
> separator (not necesarily a dot) in NoSQL database column names, to indicate 
> a nested field. In HBase for instance they are referred using colon, like 
> this: "columnFamily:column".
>
> All in all I am mostly feeling like preserving the dots from the filenames, 
> but am also very curious what other people think!
>
> 2013/8/14 Hans Drexler <[email protected]>:
>> Hi,
>>
>> First I agree with bumping this issue. When at the customer, this thing 
>> caused a lot of time spent in figuring out what was going on. I am not sure 
>> if I like the extension as part of the table name, because:
>> - I would never create a table in a relational database with a dot in
>> the name
>> - It creates a ambiguity. If you have a "full" path name to a column, like " 
>> documents.people.csv.name ", then it is not clear if the schema name is 
>> "documents.people" and the table name is "csv", or that the schema name is 
>> "documents" and the table name is "people.csv". It seems natural to me that 
>> schema names contain dots, but not table names.
>>
>> Alternatives:
>> - Leave the extension out of the name (probably not acceptable, because then 
>> you can no longer have two "tables" differing only in extension). Although I 
>> must say that personally I think this would be the best solution.
>>
>> - Use a conventional name, like:
>> Schema name: Folder name
>> Table name: The filename, including extension (all dots replaced by 
>> underscores).
>> Resulting in e.g. a column path like this:
>> documents.people_csv.name
>>
>> At the customer site, the file I needed to use was actually called like this 
>> pattern: "bar/FOO.PEOPLE.IN.FILE". Using the convention, this would become:
>> bar.FOO_PEOPLE_IN_FILE
>>
>> IMHO this is preferable to  "bar.foo.people.in.file"
>>
>> The problem is of course that it would now be impossible to have
>> another file "bar/FOO_PEOPLE_IN_FILE" :-(
>>
>> I am happy to hear other peoples thougths.
>>
>>
>> Hans
>>
>>
>> -----Original Message-----
>> From: Kasper Sørensen [mailto:[email protected]]
>> Sent: Wednesday, August 14, 2013 10:18 AM
>> To: [email protected]
>> Subject: Re: [DISCUSS] use folder name as schema name for file based
>> DataContexts
>>
>> Rats, made a mistake in that diff. The Gist has been updated [1] and now 
>> contains the ResourceUtils class which was missing before.
>> [1] https://gist.github.com/kaspersorensen/6210970
>>
>> 2013/8/12 Kasper Sørensen <[email protected]>:
>>> Here's a proposed patch (implemented for CSV and fixedwidth files
>>> which are the modules that implemented the old schema naming pattern):
>>> https://gist.github.com/kaspersorensen/6210970
>>>
>>> 2013/8/10 Kasper Sørensen <[email protected]>:
>>>> https://issues.apache.org/jira/browse/METAMODEL-4
>>>>
>>>> 2013/8/10 Henry Saputra <[email protected]>:
>>>>> What is the JIRA for this one?
>>>>>
>>>>>
>>>>> On Fri, Aug 9, 2013 at 2:26 AM, Manuel van den Berg <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> (shouldn't I just vote on the Jira for this?)
>>>>>>
>>>>>> manuel
>>>>>>
>>>>>> > -----Original Message-----
>>>>>> > From: Kasper Sørensen [mailto:[email protected]]
>>>>>> > Sent: Friday, August 09, 2013 9:03
>>>>>> > To: [email protected]
>>>>>> > Subject: Re: [DISCUSS] use folder name as schema name for file
>>>>>> > based DataContexts
>>>>>> >
>>>>>> > Allow me to bump this issue (it's my impression that more people
>>>>>> > have
>>>>>> joined
>>>>>> > in a bit late, after this topic was posted).
>>>>>> >
>>>>>> > I think this is one of the more important issues that I would
>>>>>> > want to fix before we make our first release at Apache.
>>>>>> >
>>>>>> > 2013/7/24 Kasper Sørensen <[email protected]>:
>>>>>> > > Right now we have this slightly odd naming convention for
>>>>>> > > schema and table names when building metadata for e.g. a CSV
>>>>>> > > file or a fixed width value file.
>>>>>> > >
>>>>>> > > Schema name: The filename, including file extension.
>>>>>> > > Table name: The filename without extension.
>>>>>> > > Resulting in e.g. a column path like this:
>>>>>> > > people.csv.people.name
>>>>>> > >
>>>>>> > > I suggest we change it to this convention:
>>>>>> > >
>>>>>> > > Schema name: Folder name
>>>>>> > > Table name: The filename, including file extension.
>>>>>> > > Resulting in e.g. a column path like this:
>>>>>> > > documents.people.csv.name
>>>>>> > >
>>>>>> > > Why do I think this would be an improvement?
>>>>>> > >
>>>>>> > > 1) Because this would first of all make a kind of sense to the
>>>>>> > > user to see the file system's hierarchy reflected in the schema 
>>>>>> > > model.
>>>>>> > > 2) Because it allows us to make these DataContext's operate
>>>>>> > > not on a single file, but on a directory of files. I have seen
>>>>>> > > this quite a number of times by now that users of MetaModel, or 
>>>>>> > > users of e.g.
>>>>>> > > DataCleaner, which uses MetaModel quite heavily, wants to do
>>>>>> > > this sort
>>>>>> of
>>>>>> > stuff.
>>>>>> > > 3) The removing of the file extension stuff is kind of broken
>>>>>> > > and a strange convention in the first place.
>>>>>> > >
>>>>>> > > While this doesn't really break backwards compatibility in
>>>>>> > > terms of Java code, it would break configuration files and
>>>>>> > > other stuff of applications that use MetaModel. But I do
>>>>>> > > believe that can be communicated and handled through carefully
>>>>>> > > explaining the new convention on the migration page (that I recently 
>>>>>> > > started writing [1]).
>>>>>> > >
>>>>>> > > What do you think?
>>>>>> > >
>>>>>> > > [1]
>>>>>> > > http://wiki.apache.org/metamodel/MigratingFromEobjectsMetaMode
>>>>>> > > l
>>>>>>

Reply via email to