Re: CONTAINER pool experiences

Kevin Kettner Wed, 26 Sep 2018 15:33:01 -0700

I've been using container pools for a while on one of our prod servers (built 
with container pools) and in test, but in the last few months since upgrading 
to 8.1.5 on all of my servers I created a new container pools on all of the 
servers and switched everything over to backup to the new pools. I've done some 
conversions where practical, but I'm doing some of that through attrition. The 
conversion process is fairly painless and can be stopped and restarted.

So far I'm really happy with the performance of container pools, and it's a ton 
easier to manage. 

I have a couple of questions for the more experienced container pool users. 

1. As mentioned, exports don't work with container pools. I still need to move 
nodes around between servers occasionally. I have 3 levels of service. The 
bottom tier is for archive and data is stored on tape only, which is the 
cheapest and slowest. Our customers will often decommission a server but want 
to keep the backup data for some of amount of time. For those people we used to 
export them to the archive server. We can't do that anymore, which is a bit of 
a problem. It seems like the only way to do this now is with client 
replication. We're not using client replication for anything else, but it seems 
a bit clunky since the a TSM server can only have one replication target for 
all nodes on the server. It would be pretty much impossible for people who 
actually do client replication. Is there another way to accomplish this? 

2. I've got a support case open with IBM on this, but we're kind of going in 
circles. This is only happening on 1 of 8 servers that use container pools. 

For my container directories I'm using 2 TB AIX/JFS2 file systems running off 
fibre channel connected NetApps. 

It often fills those file systems right to the brim with 0 bytes free reported 
from df, which it seems to be Ok most of the time. In the last couple weeks I 
started getting some errors like this:

Sep 26, 2018, 2:15:46 PM ANR0204I The container state for 
/bucky1dc011/5a/0000000000005a8a.dcf is updated from AVAILABLE to UNAVAILABLE. 
(PROCESS: 138)
Sep 26, 2018, 2:15:46 PM ANR3660E An unexpected error occured while opening or 
writing to the container. Container /bucky1dc011/5a/0000000000005a8a.dcf in 
stgpool DCPOOL has been marked as UNAVAILABLE and should be audited to validate 
accessibility and content. (PROCESS: 138)
Sep 26, 2018, 2:15:46 PM ANR0986I Process 138 for Move Container (Automatic) 
running in the BACKGROUND processed 26,514 items for a total of 8,165,761,024 
bytes with a completion state of WARNING at 14:15:46. (PROCESS: 138)

The file system reports as full:

]$ df /bucky1dc011
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/fslv102         2145386496 2145386496         0 100% /bucky1dc011

So I run an audit on the container. It immediately marks it back as available, 
even though the audit is not complete. The audit will complete successfully, 
but it's already back to unavailable before it's done:

Sep 26, 2018, 4:58:26 PM ANR4886I Audit Container (Scan) process started for 
container /bucky1dc011/5a/0000000000005a8a.dcf (process ID 199). (SESSION: 
531830, PROCESS: 199)
Sep 26, 2018, 4:58:26 PM ANR0984I Process 199 for AUDIT CONTAINER (SCAN) 
started in the BACKGROUND at 16:58:26. (SESSION: 531830, PROCESS: 199)
Sep 26, 2018, 4:58:26 PM ANR0984I Process 198 for AUDIT CONTAINER started in 
the BACKGROUND at 16:58:26. (SESSION: 531830, PROCESS: 198)
Sep 26, 2018, 4:58:27 PM ANR0204I The container state for 
/bucky1dc011/5a/0000000000005a8a.dcf is updated from UNAVAILABLE to AVAILABLE. 
(SESSION: 531830, PROCESS: 199)
Sep 26, 2018, 4:58:51 PM ANR3660E An unexpected error occured while opening or 
writing to the container. Container /bucky1dc011/5a/0000000000005a8a.dcf in 
stgpool DCPOOL has been marked as UNAVAILABLE and should be audited to validate 
accessibility and content. (PROCESS: 196)
Sep 26, 2018, 5:04:13 PM ANR4891I AUDIT CONTAINER process 199 ended for the 
/bucky1dc011/5a/0000000000005a8a.dcf container: 29207 data extents inspected, 0 
data extents marked as damaged, 0 data extents previously marked as damaged 
reset to undamaged, and 0 data extents marked as orphaned. (SESSION: 531830, 
PROCESS: 199)
Sep 26, 2018, 5:04:13 PM ANR0986I Process 199 for AUDIT CONTAINER (SCAN) 
running in the BACKGROUND processed 29,207 items for a total of 8,043,383,903 
bytes with a completion state of SUCCESS at 17:04:13. (SESSION: 531830, 
PROCESS: 199)
Sep 26, 2018, 5:04:13 PM ANR4013I Audit container process 198 completed audit 
of 1 containers; 1 successfully audited containers, 0 failed audited 
containers. (SESSION: 531830, PROCESS: 199)
Sep 26, 2018, 5:04:13 PM ANR0987I Process 198 for AUDIT CONTAINER running in 
the BACKGROUND processed 1 items with a completion state of SUCCESS at 
17:04:13. (SESSION: 531830, PROCESS: 199)

tsm: BUCKY1>q container /bucky1dc011/5a/0000000000005a8a.dcf f=d

                Container: /bucky1dc011/5a/0000000000005a8a.dcf
        Storage Pool Name: DCPOOL
           Container Type: Dedup
                    State: Unavailable
           Free Space(MB): 1,879
         Maximum Size(MB): 10,104
Approx. Date Last Written: 09/26/2018 16:58:49
  Approx. Date Last Audit: 09/26/2018 17:04:13
               Cloud Type:
                Cloud URL:
   Cloud Object Size (MB):
      Space Utilized (MB):
        Data Extent Count:

It doesn't mark anything as bad, but as soon as something hits it the container 
goes back to read only, in this case an automatic container move process hit 
it. 

A manual move is not successful either:

Sep 26, 2018, 5:23:10 PM ANR0984I Process 215 for Move Container started in the 
BACKGROUND at 17:23:09. (SESSION: 531830, PROCESS: 215)
Sep 26, 2018, 5:23:10 PM ANR2088E An I/O error ocurred while reading container 
/bucky1dc011/5a/0000000000005a8a.dcf in storage pool DCPOOL. (SESSION: 531830, 
PROCESS: 215)
Sep 26, 2018, 5:23:10 PM ANR0985I Process 215 for Move Container running in the 
BACKGROUND completed with completion state FAILURE at 17:23:10. (SESSION: 
531830, PROCESS: 215)
Sep 26, 2018, 5:23:10 PM ANR1893E Process 215 for Move Container completed with 
a completion state of FAILURE. (SESSION: 531830, PROCESS: 215)

After the move it puts the container in read-only state:

tsm: BUCKY1>q container /bucky1dc011/5a/0000000000005a8a.dcf f=d

                Container: /bucky1dc011/5a/0000000000005a8a.dcf
        Storage Pool Name: DCPOOL
           Container Type: Dedup
                    State: Read-Only
           Free Space(MB): 1,881
         Maximum Size(MB): 10,104
Approx. Date Last Written: 09/26/2018 16:58:49
  Approx. Date Last Audit: 09/26/2018 17:04:13
               Cloud Type:
                Cloud URL:
   Cloud Object Size (MB):
      Space Utilized (MB):
        Data Extent Count:

If I don't do the move container and just leave it as unavailable, then protect 
pool reports warnings. 

Maybe someone else has encountered and fixed this problem. If so I'd love to 
know what you did. 

Thanks!

-Kevin

-----Original Message-----
From: ADSM: Dist Stor Manager <[email protected]> On Behalf Of Alex Jaimes
Sent: Wednesday, September 26, 2018 13:26
To: [email protected]
Subject: Re: [ADSM-L] CONTAINER pool experiences

I echoed Stefan, Rick and Luc... 110%
We've been using the directory-container-pools for about 2 years and work great!
And yes, plan accordingly and monitor the TSM-DB size as you migrate backups to 
the container-pools

--Alex

On Wed, Sep 26, 2018 at 7:31 AM Michaud, Luc [Analyste principal - 
environnement AIX] <[email protected]> wrote:

> Container pools saved the day here too !
>
> On our legacy environment (TSM717), adding dedup to our seqpools just 
> bloated everything, until it became unbearable.
>
> Migrating nodes to the new blueprint replicated servers w/ 
> directory-container-pools solved a lot of our issues, especially with 
> copy-to-tape, as rehydratation is no longer required.
>
> We do have certain apprehensions with limitations for eventually 
> migrating from copy-to-tape to copy-to-cloud, but may cheat our way 
> across with VTL-type gateways if need be.
>
> Luc
>
>

Re: CONTAINER pool experiences

Reply via email to