[GitHub] madlib issue #325: Madpack/ic func schema

iyerr3 Sun, 30 Sep 2018 22:36:49 -0700

Github user iyerr3 commented on the issue:

    https://github.com/apache/madlib/pull/325
  
    > > * Use a function-specific name (`"pagerank_vertex"`) instead of a 
generic name with (`"vertex"`) no drop by in the beginning.
    > 
    > This seems doesn't solve the potential conflict with user created table.  
    
    It does with the original suggestion of not dropping a newly created table 
the first time. That suggestion led to this discussion about same table name 
being reused and the need to `drop table` before `create table` (which should 
not actually be a problem - see end of comment). 
    
    >Besides, in some IC/DC file we drop tables in between and create table 
with the same name again. In this case the second drop/creation will fail if 
user already created it.
    
    No; the table *created* in install-check will be in the install-check 
schema and the subsequent drop will look inside that schema. The original issue 
is due to a `drop table` statement on a table that was *not created in the 
file*. 
    
    > > * Create a single `"vertex"` table (with no drop by statement) and use 
that for all graph functions
    >
    > This may cause some assertion based on returning number of records fail 
as I mentioned in the previous comment, and also needs effort of audition and 
combining test data.
    
    If the data required in a table is different for different tests then we 
would benefit by giving it different names. This is a problem mostly in graphs 
where "vertex" and "edge" are re-used. These names are already vague and should 
ideally be renamed. 
     
    > > * Schema-qualify the data table with the install-check schema name.
    >
    > (not sure if I totally understand this) install-check schema is based on 
module, not algo, for example pagerank and hits and sssp use the same IC 
schema, and there will be issue between running two sql files for 
droping/creating same table.
    
    Schema-qualifying was an alternative suggestion to avoid looking into the 
`madlib` schema. With the schema-qualification, we can drop the table just like 
we're doing now (so no conflicts between test files). 
    
    **Also note:** `madpack` is supposed to drop the file-specific schema after 
executing each file (see function 
[_execute_per_module_install_dev_check_algo](https://github.com/apache/madlib/blob/03e82bc50494367c6b65dced66cf0042b3060042/src/madpack/madpack.py#L768)).
 Hence, common table names in independent tests within same module are not 
supposed to conflict with each other (if you've seen this happen then it 
requires investigation). 
    
    Finally, a guideline MADlib followed a few years ago was to not issue drop 
table statements in IC since all created tables are in a unique schema and will 
be dropped at end of test execution (I plead guilty of ignoring this 
guideline). This requires the names of data and output tables within a file to 
be unique - hopefully, we give such tables meaningful names so that we can 
identify the cause of a failure from the name. 
    
    **In summary**: I suggest we don't circumvent the original issue, rather 
fix it by giving meaningful names and avoiding `drop table` statements in 
IC/DC.

---

[GitHub] madlib issue #325: Madpack/ic func schema

Reply via email to