candlerb commented on issue #5684: Documentation: units for storageSize
URL: https://github.com/apache/pulsar/issues/5684#issuecomment-555921411
 
 
   I've made a simple test for this. Before run:
   
   ```
   ubuntu@ldex-pulsar:~$ apache-pulsar-2.4.1/bin/pulsar-admin topics stats temp
   {
     "msgRateIn" : 0.0,
     "msgThroughputIn" : 0.0,
     "msgRateOut" : 0.0,
     "msgThroughputOut" : 0.0,
     "averageMsgSize" : 0.0,
     "storageSize" : 0,
     "publishers" : [ ],
     "subscriptions" : { },
     "replication" : { },
     "deduplicationStatus" : "Disabled"
   }
   ubuntu@ldex-pulsar:~$ du -sck apache-pulsar-2.4.1/data/
   2371392      apache-pulsar-2.4.1/data/
   2371392      total
   ```
   
   Run program which publishes 500 x 1MB messages to topic "temp" (see below)
   
   After run:
   
   ```
   ubuntu@ldex-pulsar:~$ apache-pulsar-2.4.1/bin/pulsar-admin topics stats temp
   {
     "msgRateIn" : 0.0,
     "msgThroughputIn" : 0.0,
     "msgRateOut" : 0.0,
     "msgThroughputOut" : 0.0,
     "averageMsgSize" : 0.0,
     "storageSize" : 0,
     "publishers" : [ ],
     "subscriptions" : { },
     "replication" : { },
     "deduplicationStatus" : "Disabled"
   }
   ubuntu@ldex-pulsar:~$ du -sck apache-pulsar-2.4.1/data/
   3348208      apache-pulsar-2.4.1/data/
   3348208      total
   ```
   
   Weird. Stats are supposed to update every minute, but:
   
   ```
   ubuntu@ldex-pulsar:~$ sleep 60; apache-pulsar-2.4.1/bin/pulsar-admin topics 
stats temp
   {
     "msgRateIn" : 0.0,
     "msgThroughputIn" : 0.0,
     "msgRateOut" : 0.0,
     "msgThroughputOut" : 0.0,
     "averageMsgSize" : 0.0,
     "storageSize" : 0,
     "publishers" : [ ],
     "subscriptions" : { },
     "replication" : { },
     "deduplicationStatus" : "Disabled"
   }
   ubuntu@ldex-pulsar:~$ apache-pulsar-2.4.1/bin/pulsar-admin topics stats 
persistent://public/default/temp
   {
     "msgRateIn" : 0.0,
     "msgThroughputIn" : 0.0,
     "msgRateOut" : 0.0,
     "msgThroughputOut" : 0.0,
     "averageMsgSize" : 0.0,
     "storageSize" : 0,
     "publishers" : [ ],
     "subscriptions" : { },
     "replication" : { },
     "deduplicationStatus" : "Disabled"
   }
   ubuntu@ldex-pulsar:~$ apache-pulsar-2.4.1/bin/pulsar-admin topics 
stats-internal persistent://public/default/temp
   {
     "entriesAddedCounter" : 500,
     "numberOfEntries" : 2632,
     "totalSize" : 2134036790,
     "currentLedgerEntries" : 500,
     "currentLedgerSize" : 500012872,
     "lastLedgerCreatedTimestamp" : "2019-11-20T09:24:19.756Z",
     "waitingCursorsCount" : 0,
     "pendingAddEntriesCount" : 0,
     "lastConfirmedEntry" : "122673:499",
     "state" : "LedgerOpened",
     "ledgers" : [ {
       "ledgerId" : 108697,
       "entries" : 1633,
       "size" : 1632046009,
       "offloaded" : false
     }, {
       "ledgerId" : 109540,
       "entries" : 499,
       "size" : 1977909,
       "offloaded" : false
     }, {
       "ledgerId" : 122673,
       "entries" : 0,
       "size" : 0,
       "offloaded" : false
     } ],
     "cursors" : { }
   }
   ```
   
   If I give an invalid topic name to `stats` (e.g. `tempz`) it tells me the 
topic does not exist, so I must be looking at the right topic.  The retention 
period on this namespaces is set to 1440 minutes, so the lack of consumers 
shouldn't be an issue; and `stats-internal` shows storage in the ledgers.
   
   So the problem more likely is: I don't understand what the "storageSize" 
parameter of "stats" is actually representing.
   
   --------
   
   ```
   from collections import defaultdict
   import pulsar
   import time
   
   NUM_MESSAGES = 500
   MESSAGE_SIZE = 1_000_000
   
   client = pulsar.Client('pulsar://localhost:6650')
   
   producer = client.create_producer('temp', producer_name='fred', 
compression_type=pulsar.CompressionType.NONE)
   
   sent = 0
   bytes = 0
   results = defaultdict(lambda: 0)
   def ack(res, msg):
       global sent, bytes
       sent += 1
       bytes += len(msg.data())
       results[str(res)] += 1
   
   for i in range(NUM_MESSAGES):
       data = b"x" * MESSAGE_SIZE
       producer.send_async(data, callback=ack)
   
   print("Flushing...")
   producer.flush()
   print("Flush complete")
   for i in range(600):
       print("Sent: %d messages %d bytes" % (sent, bytes))
       if sent == NUM_MESSAGES:
           break
       time.sleep(0.1)
   else:
       print("Never got all the messages!")
   print("Results: %r" % dict(results))
   
   producer.close()
   client.close()
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to