[jira] [Updated] (CASSANDRA-16772) User Defined nodetool cleanup only processes one SSTable per table

Scott Carey (Jira) Tue, 29 Jun 2021 10:39:14 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-16772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Scott Carey updated CASSANDRA-16772:
------------------------------------
    Description: 
User defined nodetool cleanup uses a HashMap instead of a MultiMap to group the 
user provided SSTables by table.  This means it only keeps one file per source 
table.

It also means the unit test for this component is not sufficient.

As part of https://issues.apache.org/jira/browse/CASSANDRA-16767  I introduced 
a helper method on Descriptor:
{code:java}
public static Multimap<ColumnFamilyStore, Descriptor> 
fromFilenamesGrouped(Collection<String> filenames) {code}
That should be used instead of the custom logic in 
CompactionManager.forceUserDefinedCleanup.

 

Broken existing code:
{code:java}
        HashMap<ColumnFamilyStore, Descriptor> descriptors = Maps.newHashMap(); 
       for (String filename : filenames)
        {
            // extract keyspace and columnfamily name from filename
            Descriptor desc = Descriptor.fromFilename(filename.trim());
            if (Schema.instance.getCFMetaData(desc) == null)
            {
                logger.warn("Schema does not exist for file {}. Skipping.", 
filename);
                continue;
            }
            // group by keyspace/columnfamily
            ColumnFamilyStore cfs = 
Keyspace.open(desc.ksname).getColumnFamilyStore(desc.cfname);
            desc = cfs.getDirectories().find(new 
File(filename.trim()).getName());
            if (desc != null)
                descriptors.put(cfs, desc);
        } {code}
 

Contents of helper method introduced in other ticket:
{code:java}
 public static Multimap<ColumnFamilyStore, Descriptor> 
fromFilenamesGrouped(Collection<String> filenames) {
      Multimap<ColumnFamilyStore, Descriptor> descriptors = 
ArrayListMultimap.create();      for (String filename : filenames)
      {
          // extract keyspace and columnfamily name from filename
          Descriptor desc = Descriptor.fromFilename(filename.trim());
          if (Schema.instance.getCFMetaData(desc) == null)
          {
              logger.warn("Schema does not exist for file {}. Skipping.", 
filename);
              continue;
          }
          // group by keyspace/columnfamily
          ColumnFamilyStore cfs = 
Keyspace.open(desc.ksname).getColumnFamilyStore(desc.cfname);
          desc = cfs.getDirectories().find(new File(filename.trim()).getName());
          if (desc != null)
            descriptors.put(cfs, desc);
      }
      return descriptors;
    } {code}

  was:
User defined nodetool cleanup uses a HashMap instead of a MultiMap to group the 
user provided SSTables by table.  This means it only keeps one file per source 
table.

It also means the unit test for this component is not sufficient.

As part of https://issues.apache.org/jira/browse/CASSANDRA-16767  I introduced 
a helper method on Descriptor:
{code:java}
public static Multimap<ColumnFamilyStore, Descriptor> 
fromFilenamesGrouped(Collection<String> filenames) {code}
That should be used instead of the custom logic in 
CompactionManager.forceUserDefinedCleanup


> User Defined nodetool cleanup only processes one SSTable per table
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-16772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16772
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>            Priority: Normal
>
> User defined nodetool cleanup uses a HashMap instead of a MultiMap to group 
> the user provided SSTables by table.  This means it only keeps one file per 
> source table.
> It also means the unit test for this component is not sufficient.
> As part of https://issues.apache.org/jira/browse/CASSANDRA-16767  I 
> introduced a helper method on Descriptor:
> {code:java}
> public static Multimap<ColumnFamilyStore, Descriptor> 
> fromFilenamesGrouped(Collection<String> filenames) {code}
> That should be used instead of the custom logic in 
> CompactionManager.forceUserDefinedCleanup.
>  
> Broken existing code:
> {code:java}
>         HashMap<ColumnFamilyStore, Descriptor> descriptors = 
> Maps.newHashMap();        for (String filename : filenames)
>         {
>             // extract keyspace and columnfamily name from filename
>             Descriptor desc = Descriptor.fromFilename(filename.trim());
>             if (Schema.instance.getCFMetaData(desc) == null)
>             {
>                 logger.warn("Schema does not exist for file {}. Skipping.", 
> filename);
>                 continue;
>             }
>             // group by keyspace/columnfamily
>             ColumnFamilyStore cfs = 
> Keyspace.open(desc.ksname).getColumnFamilyStore(desc.cfname);
>             desc = cfs.getDirectories().find(new 
> File(filename.trim()).getName());
>             if (desc != null)
>                 descriptors.put(cfs, desc);
>         } {code}
>  
> Contents of helper method introduced in other ticket:
> {code:java}
>  public static Multimap<ColumnFamilyStore, Descriptor> 
> fromFilenamesGrouped(Collection<String> filenames) {
>       Multimap<ColumnFamilyStore, Descriptor> descriptors = 
> ArrayListMultimap.create();      for (String filename : filenames)
>       {
>           // extract keyspace and columnfamily name from filename
>           Descriptor desc = Descriptor.fromFilename(filename.trim());
>           if (Schema.instance.getCFMetaData(desc) == null)
>           {
>               logger.warn("Schema does not exist for file {}. Skipping.", 
> filename);
>               continue;
>           }
>           // group by keyspace/columnfamily
>           ColumnFamilyStore cfs = 
> Keyspace.open(desc.ksname).getColumnFamilyStore(desc.cfname);
>           desc = cfs.getDirectories().find(new 
> File(filename.trim()).getName());
>           if (desc != null)
>             descriptors.put(cfs, desc);
>       }
>       return descriptors;
>     } {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-16772) User Defined nodetool cleanup only processes one SSTable per table

Reply via email to